Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch processing of MCF files from a user-specified location #63

Open
alexandreleroux opened this issue Dec 20, 2016 · 3 comments
Open

Comments

@alexandreleroux
Copy link
Contributor

Feature request for the capability to launch pygeometa and process multiple MCF files located in a user-specified directory with a single command.

This 'batch mode' could be invoked directly from pygeometa with the --batch argument. If --batch is specified, then pygeometa is ran for every mcf files in the directory specified with --mfc= and all the outputs files are named the same as input files but end with .xml and are saved in the directory specified by --output. Other pygeometa arguments such as --schema= are applied to all MCF files processed in batch.

An initial version of this batch mode can simply loop on all mcf files and generate the corresponding xml files. The batch mode should skip base_mcf mcf files, which can't be processed on their own.

Future versions could:

  • Recursively look in the --mfc= folder to process .mcf files located in subfolders as well and output the same directory structure to --output=
  • Provide a log with warnings, errors and outputs. The log location and named could be specified by the --log= argument
  • The batch mode could also look in the output directly if a corresponding xml exists and skip running if the input mcf file has not change since last time the batch mode was ran. The last time the batch mode was ran could be specified in the log file found at the --log= location

Thoughts / comments?

@alexandreleroux
Copy link
Contributor Author

Regarding the possibility of skipping the generation of the XML when the MCF has not changed since last time the batch mode was ran, instead of looking for dates in the log (no logs exist at the moment), we could consider looking at the DateTime value within dateStamp, if present in the XML.

  <gmd:dateStamp>
    <gco:DateTime>2016-12-22T16:34:15Z</gco:DateTime>
  </gmd:dateStamp>

Does this make sense? While there's multiple DateTime values in the output XML, there's only a single one within dateStamp. Not certain if this logic applies to other schema though.

@RousseauLambertLP
Copy link
Contributor

I wrote a first version of the script with a --batch option. With this option, we need to provide folder for --mcf and --output. I will update the documentation and let you know once this is done.

First version of the script: https://github.com/RousseauLambertLP/pygeometa/blob/issue-63/pygeometa/core.py

@RousseauLambertLP
Copy link
Contributor

I just added one use case in the README file.

https://github.com/RousseauLambertLP/pygeometa/blob/issue-63/README.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants