Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add __generation__ to metadata extractors to be able to tell one from another #351

Merged
merged 3 commits into from
Mar 24, 2023

Conversation

yarikoptic
Copy link
Member

This way we could quickly tell one "generation" of things from another. To be used in #340 .
I don't mind if later or now it is formalized even better. FWIW in now deprecated search we have

        metadata_source=Parameter(
            args=('--metadata-source',),
            choices=('legacy', 'gen4', 'all'),
            doc="""if given, defines which metadata source will be used to
            search. 'legacy' will limit search to metadata in the old format,
            i.e. stored in '$DATASET/.datalad/metadata'. 'gen4' will limit
            search to metadata stored by the git-backend of 
            'datalad-metadata-model'. If 'all' is given, metadata from all
            supported sources will be included in the search. The default is
            'legacy'.""")

so I used "compatible" identifiers here.

@codecov-commenter
Copy link

codecov-commenter commented Feb 28, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.64 🎉

Comparison is base (fc20e5c) 86.42% compared to head (ab79082) 87.06%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #351      +/-   ##
==========================================
+ Coverage   86.42%   87.06%   +0.64%     
==========================================
  Files          88       92       +4     
  Lines        4831     5333     +502     
==========================================
+ Hits         4175     4643     +468     
- Misses        656      690      +34     
Impacted Files Coverage Δ
datalad_metalad/__init__.py 100.00% <100.00%> (ø)
datalad_metalad/extractors/base.py 88.09% <100.00%> (+1.42%) ⬆️

... and 6 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Collaborator

@christian-monch christian-monch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is generally a good idea to have a "user"-facing capability to distinguish extractor-interface-generations. But I think we should just use integers to identify the extractor generation. Especially since it is independent of the metadata storage format (we use legacy extractors to create new-style metadata). The generations would therefore identify two properties:

  • how the extractor class must be used by extract.py (we get this information internally from the base classes though).
  • whether the extractor supports file-level extraction, dataset-level extraction, or both (we get this information internally from the base classes as well).

The first property is transparent to a user of the new metalad. The second property might be of interest to a user though. We might therefore just present the capabilities w.r.t file-level and dataset-level extraction instead of an extractor generation.

For example, we could have a property called __extraction_modes__ and values like ('dataset',), ('file',), or ('dataset', 'file'). WDYT?

datalad_metalad/extractors/base.py Outdated Show resolved Hide resolved
datalad_metalad/extractors/base.py Outdated Show resolved Hide resolved
@jsheunis
Copy link
Member

jsheunis commented Mar 3, 2023

For example, we could have a property called extraction_modes and values like ('dataset',), ('file',), or ('dataset', 'file'). WDYT?

Unambiguous, I like it.

@christian-monch
Copy link
Collaborator

Unambiguous, I like it.

In order to resolve this quickly, I think it is best to discuss this topic separately. I added issue #365 to for that purpose.

@christian-monch
Copy link
Collaborator

christian-monch commented Mar 8, 2023

One more thing. The generation is basically interesting in order to know, which metalad-version can use the extractor. The latest version (>=0.3) can use generations 4, 3, and 2. The 0.2-version can use generations 3 and 2, and the old datalad-core version can use generation 2.

So the information in __generation__ is mainly of interest if you want to use older metalad versions, i.e. <= 0.2, to invoke extractors. But, older versions of metalad would also provide older extractor base-classes, which do not contain a __generation__-property.

Maybe the best use for __generation__ would be to determine at run-time, whether an extractor is compatible with a given datalad and datalad-metalad version. For this purpose it might be prudent to also state the supported generations in the metalad-code? WDYT @yarikoptic

[did some minor editing, still had a light fever when I was typing the original message]

@yarikoptic
Copy link
Member Author

is ab79082 something what you have in mind @christian-monch ? on one hand I like it but I still lack use case for it really: if future metalad drops support for some prior generation, that version would potentially break compatibility with extensions which relied on it regardless on either we list it or not here. But indeed it would make it more explicit, e.g. error could say that metalad supports only such and such generations - update to them. So not unlike versions of git-annex repo - git-annex supports a range of them and allows upgrades, but eventually prunes some old down .

@christian-monch
Copy link
Collaborator

is ab79082 something what you have in mind @christian-monch ? [...]

Yes, thanks a lot.

Copy link
Collaborator

@christian-monch christian-monch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @yarikoptic

@yarikoptic
Copy link
Member Author

ok, for better or for worse it was blessed with an approval so lets' proceed!

@yarikoptic yarikoptic merged commit d87bff8 into datalad:master Mar 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants