Add generation to metadata extractors to be able to tell one from another #351

yarikoptic · 2023-02-28T17:20:50Z

This way we could quickly tell one "generation" of things from another. To be used in #340 .
I don't mind if later or now it is formalized even better. FWIW in now deprecated search we have

        metadata_source=Parameter(
            args=('--metadata-source',),
            choices=('legacy', 'gen4', 'all'),
            doc="""if given, defines which metadata source will be used to
            search. 'legacy' will limit search to metadata in the old format,
            i.e. stored in '$DATASET/.datalad/metadata'. 'gen4' will limit
            search to metadata stored by the git-backend of 
            'datalad-metadata-model'. If 'all' is given, metadata from all
            supported sources will be included in the search. The default is
            'legacy'.""")

so I used "compatible" identifiers here.

… another

codecov-commenter · 2023-02-28T17:42:32Z

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.64 🎉

Comparison is base (fc20e5c) 86.42% compared to head (ab79082) 87.06%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #351      +/-   ##
==========================================
+ Coverage   86.42%   87.06%   +0.64%     
==========================================
  Files          88       92       +4     
  Lines        4831     5333     +502     
==========================================
+ Hits         4175     4643     +468     
- Misses        656      690      +34

Impacted Files	Coverage Δ
datalad_metalad/__init__.py	`100.00% <100.00%> (ø)`
datalad_metalad/extractors/base.py	`88.09% <100.00%> (+1.42%)`	⬆️

... and 6 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

datalad_metalad/extractors/base.py

christian-monch

I think it is generally a good idea to have a "user"-facing capability to distinguish extractor-interface-generations. But I think we should just use integers to identify the extractor generation. Especially since it is independent of the metadata storage format (we use legacy extractors to create new-style metadata). The generations would therefore identify two properties:

how the extractor class must be used by extract.py (we get this information internally from the base classes though).
whether the extractor supports file-level extraction, dataset-level extraction, or both (we get this information internally from the base classes as well).

The first property is transparent to a user of the new metalad. The second property might be of interest to a user though. We might therefore just present the capabilities w.r.t file-level and dataset-level extraction instead of an extractor generation.

For example, we could have a property called __extraction_modes__ and values like ('dataset',), ('file',), or ('dataset', 'file'). WDYT?

datalad_metalad/extractors/base.py

jsheunis · 2023-03-03T09:44:04Z

For example, we could have a property called extraction_modes and values like ('dataset',), ('file',), or ('dataset', 'file'). WDYT?

Unambiguous, I like it.

Co-authored-by: Christian Mönch <[email protected]>

christian-monch · 2023-03-08T09:11:16Z

Unambiguous, I like it.

In order to resolve this quickly, I think it is best to discuss this topic separately. I added issue #365 to for that purpose.

christian-monch · 2023-03-08T09:27:49Z

One more thing. The generation is basically interesting in order to know, which metalad-version can use the extractor. The latest version (>=0.3) can use generations 4, 3, and 2. The 0.2-version can use generations 3 and 2, and the old datalad-core version can use generation 2.

So the information in __generation__ is mainly of interest if you want to use older metalad versions, i.e. <= 0.2, to invoke extractors. But, older versions of metalad would also provide older extractor base-classes, which do not contain a __generation__-property.

Maybe the best use for __generation__ would be to determine at run-time, whether an extractor is compatible with a given datalad and datalad-metalad version. For this purpose it might be prudent to also state the supported generations in the metalad-code? WDYT @yarikoptic

[did some minor editing, still had a light fever when I was typing the original message]

yarikoptic · 2023-03-08T15:23:47Z

is ab79082 something what you have in mind @christian-monch ? on one hand I like it but I still lack use case for it really: if future metalad drops support for some prior generation, that version would potentially break compatibility with extensions which relied on it regardless on either we list it or not here. But indeed it would make it more explicit, e.g. error could say that metalad supports only such and such generations - update to them. So not unlike versions of git-annex repo - git-annex supports a range of them and allows upgrades, but eventually prunes some old down .

christian-monch · 2023-03-21T10:03:45Z

is ab79082 something what you have in mind @christian-monch ? [...]

Yes, thanks a lot.

christian-monch

Thanks a lot @yarikoptic

yarikoptic · 2023-03-24T13:04:28Z

ok, for better or for worse it was blessed with an approval so lets' proceed!

Add __generation__ to metadata extractors to be able to tell one from…

5f623bb

… another

yarikoptic mentioned this pull request Feb 28, 2023

Add a way to list all installed metalad extractors, filter, processors, pipelines, etc. #340

Open

yarikoptic mentioned this pull request Feb 28, 2023

WTF - bring back and extend information on metadata extractors etc datalad/datalad#7309

Merged

2 tasks

jsheunis reviewed Mar 1, 2023

View reviewed changes

datalad_metalad/extractors/base.py Outdated Show resolved Hide resolved

christian-monch requested changes Mar 3, 2023

View reviewed changes

datalad_metalad/extractors/base.py Outdated Show resolved Hide resolved

datalad_metalad/extractors/base.py Outdated Show resolved Hide resolved

Use integers for __generation__ instead of ad-hoc literals

94121ba

Co-authored-by: Christian Mönch <[email protected]>

christian-monch mentioned this pull request Mar 8, 2023

List extractor capabilities, e.g. in datalad wtf-output #365

Open

Add __supported_generations__

ab79082

christian-monch approved these changes Mar 21, 2023

View reviewed changes

yarikoptic merged commit d87bff8 into datalad:master Mar 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add generation to metadata extractors to be able to tell one from another #351

Add generation to metadata extractors to be able to tell one from another #351

yarikoptic commented Feb 28, 2023

codecov-commenter commented Feb 28, 2023 •

edited

Loading

christian-monch left a comment •

edited

Loading

jsheunis commented Mar 3, 2023

christian-monch commented Mar 8, 2023

christian-monch commented Mar 8, 2023 •

edited

Loading

yarikoptic commented Mar 8, 2023

christian-monch commented Mar 21, 2023

christian-monch left a comment

yarikoptic commented Mar 24, 2023

Add __generation__ to metadata extractors to be able to tell one from another #351

Add __generation__ to metadata extractors to be able to tell one from another #351

Conversation

yarikoptic commented Feb 28, 2023

codecov-commenter commented Feb 28, 2023 • edited Loading

Codecov Report

christian-monch left a comment • edited Loading

Choose a reason for hiding this comment

jsheunis commented Mar 3, 2023

christian-monch commented Mar 8, 2023

christian-monch commented Mar 8, 2023 • edited Loading

yarikoptic commented Mar 8, 2023

christian-monch commented Mar 21, 2023

christian-monch left a comment

Choose a reason for hiding this comment

yarikoptic commented Mar 24, 2023

Add generation to metadata extractors to be able to tell one from another #351

Add generation to metadata extractors to be able to tell one from another #351

codecov-commenter commented Feb 28, 2023 •

edited

Loading

christian-monch left a comment •

edited

Loading

christian-monch commented Mar 8, 2023 •

edited

Loading