You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is to initiate a debate on pipeline.ini files, to provide a convention that will work for input validation.
In its current format, input validation would happen directly on the ini file before anything else happens.
1. How to handle file paths
There have been two suggestions:
Require a full path for every file that is used
Easy to write a validation script
Portable, no cgat file structure required
Would break with current practice: e.g. /path/to/fasta would replace the genome_dir and genome variables
Require a "file(name)" prefix/suffix in parameter name
2. How to handle common directories
As per Ian's comment in pull request #331, providing the directory once may be desirable for directories with multiple required files in them. Any ideas on handling something like this using input validation would be helpful:
And what about this, where basename is then assembled into multiple files basename.file1 and basename.file2 by python later?
feature_dir=/path/to/dir
feature=basename
3. How to deal with defaults
Options
Keep defaults
Empty file with suggestions in comments (+/- a filled-in example)
4. How to deal with mandatory input
Ideas that can be parsed by an input validation script
add "?" for mandatory input and provide user with example input
add "req" or similar suffix/prefix to the parameter name
Ultimately, the question is, do we want to do this at all?
Apart from having to change all pipeline.ini files and the pipelines (depending on choices), it would also require reconfiguring all your existing inis.
The text was updated successfully, but these errors were encountered:
Please note that the existing input validation happens after the PARAMS dictionary has been fully loaded, which includes more steps than parsing the pipeline.ini file. Actually, it may parse more than one .ini file and also there are 1) input given in the command-line and 2) hard-coded values (e.g. in Parameters.py) However, it does happen just before running any ruffus task, and I think that's what we need.
I think some form of input validation is desirable, but I understand we need to be a bit flexible as well. Here are my thoughts:
we should use full paths whenever possible
provide empty file with suggestions in comments
use "?" for mandatory input and provide examples
It would be also required to have a key word in the parameter name. For example, containing file and/or dir to be explicit about what input you would expect a file or any other configuration parameter (i.e. job queue name). Moreover, it would be required to distinguish between input and output files, as the former can be tested but the latter cannot.
This is to initiate a debate on pipeline.ini files, to provide a convention that will work for input validation.
In its current format, input validation would happen directly on the ini file before anything else happens.
1. How to handle file paths
There have been two suggestions:
2. How to handle common directories
As per Ian's comment in pull request #331, providing the directory once may be desirable for directories with multiple required files in them. Any ideas on handling something like this using input validation would be helpful:
And what about this, where basename is then assembled into multiple files basename.file1 and basename.file2 by python later?
3. How to deal with defaults
Options
4. How to deal with mandatory input
Ideas that can be parsed by an input validation script
Ultimately, the question is, do we want to do this at all?
Apart from having to change all pipeline.ini files and the pipelines (depending on choices), it would also require reconfiguring all your existing inis.
The text was updated successfully, but these errors were encountered: