-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide a default implementation for locating metadata source files for extraction #379
Comments
metalad supports an unlimited number of "runtime"-arguments that can be provided via CLI or API to the extractors. For example: > datalad meta-extract my_awesome_extractor extractor_arg_1 extractor_arg_2 ... extractor_arg_n The number and nature of the additional arguments is not defined by (The topic is also mentioned in psychoinformatics-de/datalad-tabby#2) |
I think this relates to my "desire" to just be able to extract all metadata an extractor can extract across the files in the dataset. E.g. for datalad-catalog we need But overall it is likely the "property" of an extractor to know which files it could extract metadata about, instead of me feeding each extractor with all the paths even if they are not appropriate for it. |
I'm thinking about streamlining and deduplicating code for metadata extractors in the context of psychoinformatics-de/datalad-tabby#2.
The goal is to provide a method with which agents can supply arguments (or not) to
meta-extract
that allow flexibility for locating files with metadata sources. Currently, there are multiple ways that this is done:metadata_source
file, e.g. thebids_dataset
extractor indatalad-neuroimaging
will always look for the./dataset_description.json
file relative to the root of the dataset.genericjson_dataset
extractor indatalad-metalad
uses a combination of a default location (.metadata/dataset.json
) and location(s) provided as extraction arguments during themeta-extract
call.I think it makes sense to provide a standard implementation for this process within metalad, so that there doesn't have to be any code duplication in extractor code. My suggestion is:
meta-extract
call with themetadata_source
extraction argument (multiple = serialized list)Any extractor will then take the extractor arguments as priority, and will default to the dataset configuration.
The text was updated successfully, but these errors were encountered: