Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return MD5 of existing files on open #145

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

zyx-billy
Copy link
Member

Fixes #137 (involves reverting #113).

#113 changed the behavior of open so that it always returns an empty md5 dict. This means the client will always send all their files each time. Interestingly, md5 checking is done during upload requests. Not only is this inefficient, it is not even needed: If the client already sent us their latest version of the file, there's no point in checking md5 at this point, just save it. The entire point of md5 was so that the client wouldn't need to send us everything.

Changes proposed in this PR:

  • Send back hashes of existing files during an open request. This way the client can compare checksums and only send over new and updated files.
  • The status file_exists is removed since it's not useful or even worth it to compare checksums during an upload request. The client should be able to make the choice of what files it want uploaded. And it makes no difference in the end if the client uploaded an existing file. (In fact, if by some odd chance, two files resulted in the same md5, this would be the only way it could get updated).

@cg2v
Copy link
Member

cg2v commented Nov 20, 2017

There is some value in not repeatedly overwriting the common files on each grade request. If you do overwrite them, you risk having the vmms copying a partial file if another tango thread is writing at the same time the vmms is reading (or running scp). You could fix that by using a .NEW file and renaming if you really wanted to.

The file list was removed from tango because it caused performance problems when assignment directories had alot of items in them. That's where the excessive computations server side occur.

@zyx-billy
Copy link
Member Author

I see. In that case, I agree we should not haphazardly overwrite existing files. We should probably implement a solution for replacing files since it does occur even right now, admittedly very rarely. However, at the same time, this seems like something we could minimize by letting the client know what files we already have so they don't even need to send it. If the only problem with this is too much computation during open, have we considered storing the calculated md5s (perhaps in each courselab directory?), and just send that over during open (and obviously, they'll be updated each time upload gets something new).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants