Batch processing of MCF files from a user-specified location #63

alexandreleroux · 2016-12-20T20:04:39Z

Feature request for the capability to launch pygeometa and process multiple MCF files located in a user-specified directory with a single command.

This 'batch mode' could be invoked directly from pygeometa with the --batch argument. If --batch is specified, then pygeometa is ran for every mcf files in the directory specified with --mfc= and all the outputs files are named the same as input files but end with .xml and are saved in the directory specified by --output. Other pygeometa arguments such as --schema= are applied to all MCF files processed in batch.

An initial version of this batch mode can simply loop on all mcf files and generate the corresponding xml files. The batch mode should skip base_mcf mcf files, which can't be processed on their own.

Future versions could:

Recursively look in the --mfc= folder to process .mcf files located in subfolders as well and output the same directory structure to --output=
Provide a log with warnings, errors and outputs. The log location and named could be specified by the --log= argument
The batch mode could also look in the output directly if a corresponding xml exists and skip running if the input mcf file has not change since last time the batch mode was ran. The last time the batch mode was ran could be specified in the log file found at the --log= location

Thoughts / comments?

The text was updated successfully, but these errors were encountered:

alexandreleroux · 2016-12-22T16:42:32Z

Regarding the possibility of skipping the generation of the XML when the MCF has not changed since last time the batch mode was ran, instead of looking for dates in the log (no logs exist at the moment), we could consider looking at the DateTime value within dateStamp, if present in the XML.

  <gmd:dateStamp>
    <gco:DateTime>2016-12-22T16:34:15Z</gco:DateTime>
  </gmd:dateStamp>

Does this make sense? While there's multiple DateTime values in the output XML, there's only a single one within dateStamp. Not certain if this logic applies to other schema though.

RousseauLambertLP · 2017-12-08T19:30:13Z

I wrote a first version of the script with a --batch option. With this option, we need to provide folder for --mcf and --output. I will update the documentation and let you know once this is done.

First version of the script: https://github.com/RousseauLambertLP/pygeometa/blob/issue-63/pygeometa/core.py

RousseauLambertLP · 2017-12-08T19:43:36Z

I just added one use case in the README file.

https://github.com/RousseauLambertLP/pygeometa/blob/issue-63/README.md

alexandreleroux added enhancement help wanted labels Dec 20, 2016

alexandreleroux mentioned this issue Dec 20, 2016

Autolaunch of post-processing script #64

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch processing of MCF files from a user-specified location #63

Batch processing of MCF files from a user-specified location #63

alexandreleroux commented Dec 20, 2016

alexandreleroux commented Dec 22, 2016

RousseauLambertLP commented Dec 8, 2017

RousseauLambertLP commented Dec 8, 2017

Batch processing of MCF files from a user-specified location #63

Batch processing of MCF files from a user-specified location #63

Comments

alexandreleroux commented Dec 20, 2016

alexandreleroux commented Dec 22, 2016

RousseauLambertLP commented Dec 8, 2017

RousseauLambertLP commented Dec 8, 2017