We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
smaug
This is an informative issue:
Using meta-conduct and a simple extraction pipeline on smaug, we extracted about 1,300,000 file-level metadata records with the following command
meta-conduct
openneuro> date; time datalad -f json meta-conduct ~/tmp/extract_test.json \ traverser.top_level_dir=$(pwd) \ traverser.item_type=both \ traverser.traverse_sub_datasets=True \ extractor.extractor_type=file \ extractor.extractor_name=metalad_core \ >/.../extract_test_1.jsonl 2>/.../extract_test_1.err Tue 28 Mar 2023 01:08:38 AM EDT real 464m43.061s user 3171m24.090s sys 2597m34.377s
The extraction pipeline in file ~/tmp/extract_test.json has the following content:
~/tmp/extract_test.json
{ "provider": { "module": "datalad_metalad.pipeline.provider.datasettraverse", "class": "DatasetTraverser", "name": "traverser", "arguments": {} }, "processors": [ { "module": "datalad_metalad.pipeline.processor.extract", "class": "MetadataExtractor", "name": "extractor", "arguments": {} } ] }
The text was updated successfully, but these errors were encountered:
Info on bids_dataset extraction performance on openneuro:
bids_dataset
openneuro
openneuro> time datalad -f json meta-conduct ~/tmp/extract_test.json \ traverser.top_level_dir=$(pwd) \ traverser.item_type=dataset \ traverser.traverse_sub_datasets=True \ extractor.extractor_type=dataset \ extractor.extractor_name=bids_dataset \ >/.../extract_test_bids_dataset_1.jsonl 2>/.../extract_test_bids_dataset_1.err real 25m23.549s user 146m47.854s sys 4m54.585s
Sorry, something went wrong.
haxby
metalad_core
haxby> time datalad -f json meta-conduct ~/tmp/extract_test.json traverser.top_level_dir=$(pwd) traverser.item_type=file traverser.traverse_sub_datasets=True extractor.extractor_type=file extractor.extractor_name=metalad_core 2>/.../test_haxby_1.err >/.../test_haxby_1.jsonl real 10m49.770s user 28m33.421s sys 30m13.129s
haxby> time datalad -f json meta-conduct ~/tmp/extract_test.json traverser.top_level_dir=$(pwd) traverser.item_type=dataset traverser.traverse_sub_datasets=True extractor.extractor_type=dataset extractor.extractor_name=metalad_core 2>/.../test_haxby_dataset_1.err >/.../test_haxby_dataset_1.jsonl real 0m4.564s user 0m7.279s sys 0m3.550s
No branches or pull requests
This is an informative issue:
Using
meta-conduct
and a simple extraction pipeline onsmaug
, we extracted about 1,300,000 file-level metadata records with the following commandThe extraction pipeline in file
~/tmp/extract_test.json
has the following content:The text was updated successfully, but these errors were encountered: