Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filtering short contig lenghts before annotation #128

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

yykaya
Copy link

@yykaya yykaya commented Dec 13, 2024

I've added a filtering step for input assemblies to avoid poor downstream analysis and misleading interpretations of gene variations in each contigs. It works well in local computer but not tested in hpc yet.

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@GallVp
Copy link
Member

GallVp commented Dec 13, 2024

Hi @yykaya

Thank you for the PR. This is very useful. Let's work together to get this merged. A couple of items to tick off before we merge, though.

  • We are using the nf-core template. It requires that the main branch is in sync with the latest release. Thus, all new contributions should target the dev branch. I have done this by clicking the edit button at the top and changing the base to dev.
  • Please see the other comments.

@GallVp GallVp changed the base branch from main to dev December 13, 2024 23:20
@@ -79,7 +80,15 @@ params {
custom_config_base = "https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}"

}

// Validation for the min_contig_length parameter
process {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite clever. However, we are using nf-schema for parameter validation which means that the parameter type and constraints are defined in a schema file and the plugin automatically validates all the parameters. The schema file is here: https://github.com/Plant-Food-Research-Open/genepal/blob/dev/nextflow_schema.json

This schema can be automatically generated and refined through a web-based GUI. Please see the nf-core docs: https://nf-co.re/docs/nf-core-tools/pipelines/schema

@@ -19,6 +19,7 @@ params {
orthofinder_annotations = null
outdir = null
email = null
min_contig_length = 5000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move this to the // Annotation options section of the config?

// WORKFLOW: Run main workflow
// Filter genome assembly by minimum contig length
//
SEQKIT_GET_LENGTH(PIPELINE_INITIALISATION.out.target_assembly)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SEQKIT_GET_LENGTH should be part of the GENEPAL workflow defined in workflows/genepal.nf file. This structure is also inherited from the nf-core template and allows creating of meta-pipelines where two are more pipelines can be joined into a larger single pipeline.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/

process SEQKIT_GET_LENGTH {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use existing nf-core modules instead of a custom local module?

Nonetheless, custom local modules should be placed in the modules/local/ directory.

@@ -71,6 +71,16 @@ Each row represents an input genome and the fields are:
- `fasta:` fasta file for the genome
- `is_masked`: yes or no to denote whether the fasta file is already masked or not

#### `--min_contig_length`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameter documentation is auto generated with the following command,

nf-core -v pipelines schema docs > docs/parameters.md

The parameters are documented in the docs/parameters.md file.

@GallVp
Copy link
Member

GallVp commented Dec 13, 2024

We use pre-commit to automatically fix code linting issues. You can enable pre-commit by doing,

pip install pre-commit

cd genepal
pre-commit install

git add -A
pre-commit run --all-files

@GallVp
Copy link
Member

GallVp commented Dec 13, 2024

nf-core linting is failing (https://github.com/Plant-Food-Research-Open/genepal/actions/runs/12314789807/job/34373180357?pr=128) because the new parameter has not been added to the pipeline schema. You can add it by doing,

nf-core -v pipelines schema build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants