-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add __generation__ to metadata extractors to be able to tell one from another #351
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #351 +/- ##
==========================================
+ Coverage 86.42% 87.06% +0.64%
==========================================
Files 88 92 +4
Lines 4831 5333 +502
==========================================
+ Hits 4175 4643 +468
- Misses 656 690 +34
... and 6 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is generally a good idea to have a "user"-facing capability to distinguish extractor-interface-generations. But I think we should just use integers to identify the extractor generation. Especially since it is independent of the metadata storage format (we use legacy extractors to create new-style metadata). The generations would therefore identify two properties:
- how the extractor class must be used by
extract.py
(we get this information internally from the base classes though). - whether the extractor supports file-level extraction, dataset-level extraction, or both (we get this information internally from the base classes as well).
The first property is transparent to a user of the new metalad. The second property might be of interest to a user though. We might therefore just present the capabilities w.r.t file-level and dataset-level extraction instead of an extractor generation.
For example, we could have a property called __extraction_modes__
and values like ('dataset',)
, ('file',)
, or ('dataset', 'file')
. WDYT?
Unambiguous, I like it. |
Co-authored-by: Christian Mönch <[email protected]>
In order to resolve this quickly, I think it is best to discuss this topic separately. I added issue #365 to for that purpose. |
One more thing. The generation is basically interesting in order to know, which metalad-version can use the extractor. The latest version (>=0.3) can use generations 4, 3, and 2. The 0.2-version can use generations 3 and 2, and the old datalad-core version can use generation 2. So the information in Maybe the best use for [did some minor editing, still had a light fever when I was typing the original message] |
is ab79082 something what you have in mind @christian-monch ? on one hand I like it but I still lack use case for it really: if future metalad drops support for some prior generation, that version would potentially break compatibility with extensions which relied on it regardless on either we list it or not here. But indeed it would make it more explicit, e.g. error could say that metalad supports only such and such generations - update to them. So not unlike versions of git-annex repo - git-annex supports a range of them and allows upgrades, but eventually prunes some old down . |
Yes, thanks a lot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @yarikoptic
ok, for better or for worse it was blessed with an approval so lets' proceed! |
This way we could quickly tell one "generation" of things from another. To be used in #340 .
I don't mind if later or now it is formalized even better. FWIW in now deprecated
search
we haveso I used "compatible" identifiers here.