-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filtering short contig lenghts before annotation #128
base: dev
Are you sure you want to change the base?
Conversation
Hi @yykaya Thank you for the PR. This is very useful. Let's work together to get this merged. A couple of items to tick off before we merge, though.
|
@@ -79,7 +80,15 @@ params { | |||
custom_config_base = "https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}" | |||
|
|||
} | |||
|
|||
// Validation for the min_contig_length parameter | |||
process { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite clever. However, we are using nf-schema for parameter validation which means that the parameter type and constraints are defined in a schema file and the plugin automatically validates all the parameters. The schema file is here: https://github.com/Plant-Food-Research-Open/genepal/blob/dev/nextflow_schema.json
This schema can be automatically generated and refined through a web-based GUI. Please see the nf-core docs: https://nf-co.re/docs/nf-core-tools/pipelines/schema
@@ -19,6 +19,7 @@ params { | |||
orthofinder_annotations = null | |||
outdir = null | |||
email = null | |||
min_contig_length = 5000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we move this to the // Annotation options
section of the config?
// WORKFLOW: Run main workflow | ||
// Filter genome assembly by minimum contig length | ||
// | ||
SEQKIT_GET_LENGTH(PIPELINE_INITIALISATION.out.target_assembly) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SEQKIT_GET_LENGTH
should be part of the GENEPAL
workflow defined in workflows/genepal.nf
file. This structure is also inherited from the nf-core template and allows creating of meta-pipelines where two are more pipelines can be joined into a larger single pipeline.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
*/ | ||
|
||
process SEQKIT_GET_LENGTH { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use existing nf-core modules instead of a custom local module?
Nonetheless, custom local modules should be placed in the modules/local/
directory.
@@ -71,6 +71,16 @@ Each row represents an input genome and the fields are: | |||
- `fasta:` fasta file for the genome | |||
- `is_masked`: yes or no to denote whether the fasta file is already masked or not | |||
|
|||
#### `--min_contig_length` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parameter documentation is auto generated with the following command,
nf-core -v pipelines schema docs > docs/parameters.md
The parameters are documented in the docs/parameters.md
file.
We use pre-commit to automatically fix code linting issues. You can enable pre-commit by doing, pip install pre-commit
cd genepal
pre-commit install
git add -A
pre-commit run --all-files |
nf-core linting is failing (https://github.com/Plant-Food-Research-Open/genepal/actions/runs/12314789807/job/34373180357?pr=128) because the new parameter has not been added to the pipeline schema. You can add it by doing, nf-core -v pipelines schema build |
I've added a filtering step for input assemblies to avoid poor downstream analysis and misleading interpretations of gene variations in each contigs. It works well in local computer but not tested in hpc yet.
PR checklist
nf-core pipelines lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).nextflow run . -profile debug,test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).