-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Add keysize method #28
base: master
Are you sure you want to change the base?
Conversation
Hi, thanks for your PR! I really like the idea of putting commonly needed functions into the library. I've been planning to move some generic and re-usable stuff from git-annex-remote-googledrive to this library, too. But I want to keep them separate from the protocol functions. All public methods of I propose adding a |
Ok I thought about it some more, and where I'm actually heading with My ideas so far:
The latter would be a very good place to add this function. It could just have a I haven't come around to implement any of this yet, but it should be quite straightforward and would only add to the existing functionality without breaking anything. If instances of You are very welcome to add a |
Here's a draft |
24858f8
to
aa763b7
Compare
Sorry for the delay. I was quite busy in the last weeks. I'm going to look into it today! |
I used this Key class a little since I made it. I noted these:
|
keyparts = keyparts.split('-') | ||
|
||
# chunking | ||
if len(keyparts) > 2 and keyparts[-1].startswith('C') and keyparts[-2].startswith('S'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the idea of making every information that we've got about the key easily accessible!
def __repr__(self): | ||
return str(self) | ||
|
||
class UriList(collections.MutableSet): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's clever way to handle the URIs!
@property | ||
def uris(self): | ||
if self._uris is None: | ||
self._uris = self._annex.geturls(self._key, self._prefix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once this has been called, there's effectively caching in place, so we might as well cache the state, too.
return UriList(self.key, self.remote.annex, prefix) | ||
|
||
@property | ||
def state(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In git-annex-remote-googledrive
I use JSON to store arbitrary values in state
. I figure this could be generally useful, so I'd like to provide an optional class with a nice interface at some point. What do you think?
Accompanying thoughts regarding the state in general:
- Caching: We could only read it the first time and then use a cached value. We would only update this value when the setter is called. This would save some messaging round-trips with git-annex in case the state is used multiple times by the remote. But this approach also assumes we're the only process accessing the state, which might not be a safe assumption to make.
- Like (1), but when writing, check beforehand if the value in the annex is equal to the cached value and throw an exception if it's not. => Saves round-trips when reading, adds an additional when writing to be safer.
- Like (2), but also update from the annex on each read. => None of the performance benefits, but less chance for overwriting changes than (4).
- Do nothing of this and let git-annex take care of merging the state, i.e. the last change always wins. And what we read is what is actually there.
Sometimes it's best not to overcomplicate stuff. However, with hundreds of thousands of keys, I found it crucial to optimize away all the unnecessary round-trips with git-annex. I think it's a question of how much work we want the library to take away from the remote application. I mean, stuff like caching can always be made configurable.
''' | ||
Attributes | ||
---------- | ||
remote : SpecialRemote |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only part of which I'm not yet convinced. Do you see any benefit in tying Key
to SpecialRemote
instead of Master
? It's something that is passed from the annex to the remote after all. And the latter should know itself already.
I think I'd prefer
annex: Master
here.
(I want to get rid of Master
eventually and instead use more descriptive names like Annex
and Protocol
.)
It looked so nice, but yeah, let's move them to properties or methods.
Oh no! I would have loved to just swap it out. Well, if this is a breaking change, then I guess we're approaching version |
All your recommendations seem fine. I haven't thrown them in yet because I'm not around the remote I was working on, in order to test changes to ensure I didn't make an error. |
Hi Lykos,
What do you think of adding this utility method to extract the data size from keys?
I'd like to switch to using your professional-looking library for git annex remotes, and I use this method a lot.
I haven't tested this code yet, but I likely will soon. Just looking for your input.