Updated README and flowchart

Plant-Food-Research-Open · Aug 20, 2024 · d1bc3ce · d1bc3ce
1 parent 51b7482
commit d1bc3ce
Show file tree

Hide file tree

Showing 8 changed files with 198 additions and 166 deletions.
diff --git a/.nf-core.yml b/.nf-core.yml
@@ -8,17 +8,21 @@ lint:
     - .github/workflows/awstest.yml
     - .github/workflows/awsfulltest.yml
     - conf/igenomes.config
+    - assets/multiqc_config.yml
   files_unchanged:
     - CODE_OF_CONDUCT.md
     - assets/nf-core-genepal_logo_light.png
     - docs/images/nf-core-genepal_logo_light.png
     - docs/images/nf-core-genepal_logo_dark.png
     - .github/ISSUE_TEMPLATE/bug_report.yml
-  multiqc_config:
-    - report_comment
+    - docs/README.md
+    - LICENSE
   nextflow_config:
     - manifest.name
     - manifest.homePage
+    - config_defaults:
+        - params.ribo_database_manifest
+  multiqc_config: false
 nf_core_version: 2.14.1
 repository_type: pipeline
 template:

diff --git a/README.md b/README.md
@@ -3,58 +3,59 @@
 [![nf-test](https://img.shields.io/badge/unit_tests-nf--test-337ab7.svg)](https://www.nf-test.com)
 
 [![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A523.04.0-23aa62.svg)](https://www.nextflow.io/)
-[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/)
+[![run with conda ❌](http://img.shields.io/badge/run%20with-conda%20❌-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/)
 [![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/)
 [![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/)
 [![Launch on Seqera Platform](https://img.shields.io/badge/Launch%20%F0%9F%9A%80-Seqera%20Platform-%234256e7)](https://cloud.seqera.io/launch?pipeline=https://github.com/plant-food-research-open/genepal)
 
 ## Introduction
 
-**plant-food-research-open/genepal** is a bioinformatics pipeline that ...
+**plant-food-research-open/genepal** is a bioinformatics pipeline for single genome and pan-genome annotation. An overview is shown in the [Pipeline Flowchart](#pipeline-flowchart) and the references for the tools are listed in [CITATIONS.md](./CITATIONS.md).
 
-<!-- TODO nf-core:
-   Complete this sentence with a 2-3 sentence summary of what types of data the pipeline ingests, a brief overview of the
-   major pipeline sections and the types of output it produces. You're giving an overview to someone new
-   to nf-core here, in 15-20 seconds. For an example, see https://github.com/nf-core/rnaseq/blob/master/README.md#introduction
--->
+## Pipeline Flowchart
 
-<!-- TODO nf-core: Include a figure that guides the user through the major workflow steps. Many nf-core
-     workflows use the "tube map" design for that. See https://nf-co.re/docs/contributing/design_guidelines#examples for examples.   -->
-<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->
+<p align="center"><img src="docs/img/genepal.png" width="100%" style="border-radius: 25px;"></p>
 
-1. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
-2. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))
+- [FASTA VALIDATOR](https://github.com/linsalrob/fasta_validator): Validate genome fasta
+- [EDTA](https://github.com/oushujun/EDTA) or [REPEATMODELER](https://github.com/Dfam-consortium/RepeatModeler): Create TE library
+- [REPEATMASKER](https://github.com/rmhubley/RepeatMasker): Soft mask the genome fasta
+- [FASTQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc), [FASTP](https://github.com/OpenGene/fastp), [SORTMERNA](https://github.com/sortmerna/sortmerna): QC, trim and filter RNASeq evidence
+- [BRAKER](https://github.com/Gaius-Augustus/BRAKER): Annotate the genome fasta
+- [LIFTOFF](https://github.com/agshumate/Liftoff): Liftoff annotations from reference genome fasta/gff
+- [TSEBRA](https://github.com/Gaius-Augustus/TSEBRA) and [AGAT](https://github.com/NBISweden/AGAT): Merge BRAKER and Liftoff annotations
+- [EGGNOG-MAPPER](https://github.com/eggnogdb/eggnog-mapper): Add functional annotation to gff
+- [ORTHOFINDER](https://github.com/davidemms/OrthoFinder): Perform phylogenetic orthology inference across input genomes
 
 ## Usage
 
+Refer to [usage](./docs/usage.md), [parameters](./docs/parameters.md) and [output](./docs/output.md) documents for details.
+
 > [!NOTE]
 > If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.
 
-<!-- TODO nf-core: Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.
-     Explain what rows and columns represent. For instance (please edit as appropriate):
-
-First, prepare a samplesheet with your input data that looks as follows:
+First, prepare an assemblysheet with your input genomes that looks as follows:
 
-`samplesheet.csv`:
+`assemblysheet.csv`:
 
 ```csv
-sample,fastq_1,fastq_2
-CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
+tag,fasta,is_masked
+a_thaliana,/path/to/genome.fa,yes
 ```
 
-Each row represents a fastq file (single-end) or a pair of fastq files (paired end).
+Each row represents an input genome and the fields are:
 
--->
+- `tag:` A unique tag which represents the genome throughout the pipeline
+- `fasta:` fasta file for the genome
+- `is_masked`: yes or no to denote whether the fasta file is already masked or not
 
-Now, you can run the pipeline using:
-
-<!-- TODO nf-core: update the following command to include all required parameters for a minimal example -->
+At minimum, a file with proteins as evidence is also required. Now, you can run the pipeline using:
 
 ```bash
 nextflow run plant-food-research-open/genepal \
-   -profile <docker/singularity/.../institute> \
-   --input samplesheet.csv \
-   --outdir <OUTDIR>
+  -profile <docker/singularity/.../institute> \
+  --input assemblysheet.csv \
+  --protein_evidence proteins.faa
+  --outdir <OUTDIR>
 ```
 
 > [!WARNING]
@@ -63,11 +64,44 @@ nextflow run plant-food-research-open/genepal \
 
 ## Credits
 
-plant-food-research-open/genepal was originally written by Usman Rashid, Jason Shiller.
+plant-food-research-open/genepal workflows were originally scripted by Jason Shiller ([@jasonshiller](https://github.com/jasonshiller)). Usman Rashid ([@gallvp](https://github.com/gallvp)) wrote the Nextflow pipeline.
 
 We thank the following people for their extensive assistance in the development of this pipeline:
 
-<!-- TODO nf-core: If applicable, make list of people who have also contributed -->
+- Cecilia Deng [@CeciliaDeng](https://github.com/CeciliaDeng)
+- Charles David [@charlesdavid](https://github.com/charlesdavid)
+- Chen Wu [@christinawu2008](https://github.com/christinawu2008)
+- Leonardo Salgado [@leorippel](https://github.com/leorippel)
+- Ross Crowhurst [@rosscrowhurst](https://github.com/rosscrowhurst)
+- Susan Thomson [@cflsjt](https://github.com/cflsjt)
+- Ting-Hsuan Chen [@ting-hsuan-chen](https://github.com/ting-hsuan-chen)
+
+The pipeline uses nf-core modules contributed by following authors:
+
+<a href="https://github.com/gallvp"><img src="https://github.com/gallvp.png" width="50" height="50"></a>
+<a href="https://github.com/drpatelh"><img src="https://github.com/drpatelh.png" width="50" height="50"></a>
+<a href="https://github.com/kevinmenden"><img src="https://github.com/kevinmenden.png" width="50" height="50"></a>
+<a href="https://github.com/grst"><img src="https://github.com/grst.png" width="50" height="50"></a>
+<a href="https://github.com/toniher"><img src="https://github.com/toniher.png" width="50" height="50"></a>
+<a href="https://github.com/joseespinosa"><img src="https://github.com/joseespinosa.png" width="50" height="50"></a>
+<a href="https://github.com/edmundmiller"><img src="https://github.com/edmundmiller.png" width="50" height="50"></a>
+<a href="https://github.com/kherronism"><img src="https://github.com/kherronism.png" width="50" height="50"></a>
+<a href="https://github.com/vagkaratzas"><img src="https://github.com/vagkaratzas.png" width="50" height="50"></a>
+<a href="https://github.com/robsyme"><img src="https://github.com/robsyme.png" width="50" height="50"></a>
+<a href="https://github.com/priyanka-surana"><img src="https://github.com/priyanka-surana.png" width="50" height="50"></a>
+<a href="https://github.com/praveenraj2018"><img src="https://github.com/praveenraj2018.png" width="50" height="50"></a>
+<a href="https://github.com/muffato"><img src="https://github.com/muffato.png" width="50" height="50"></a>
+<a href="https://github.com/matthdsm"><img src="https://github.com/matthdsm.png" width="50" height="50"></a>
+<a href="https://github.com/mashehu"><img src="https://github.com/mashehu.png" width="50" height="50"></a>
+<a href="https://github.com/mahesh-panchal"><img src="https://github.com/mahesh-panchal.png" width="50" height="50"></a>
+<a href="https://github.com/jvhagey"><img src="https://github.com/jvhagey.png" width="50" height="50"></a>
+<a href="https://github.com/jfy133"><img src="https://github.com/jfy133.png" width="50" height="50"></a>
+<a href="https://github.com/jemten"><img src="https://github.com/jemten.png" width="50" height="50"></a>
+<a href="https://github.com/friederikehanssen"><img src="https://github.com/friederikehanssen.png" width="50" height="50"></a>
+<a href="https://github.com/felixkrueger"><img src="https://github.com/felixkrueger.png" width="50" height="50"></a>
+<a href="https://github.com/ewels"><img src="https://github.com/ewels.png" width="50" height="50"></a>
+<a href="https://github.com/erikrikarddaniel"><img src="https://github.com/erikrikarddaniel.png" width="50" height="50"></a>
+<a href="https://github.com/charles-plessy"><img src="https://github.com/charles-plessy.png" width="50" height="50"></a>
 
 ## Contributions and Support
 
@@ -78,8 +112,6 @@ If you would like to contribute to this pipeline, please see the [contributing g
 <!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. -->
 <!-- If you use plant-food-research-open/genepal for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXX](https://doi.org/10.5281/zenodo.XXXXXX) -->
 
-<!-- TODO nf-core: Add bibliography of tools and data used in your pipeline -->
-
 An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.
 
 This pipeline uses code and infrastructure developed and maintained by the [nf-core](https://nf-co.re) community, reused here under the [MIT license](https://github.com/nf-core/tools/blob/master/LICENSE).

diff --git a/assets/rrna-db-test.txt b/assets/rrna-db-test.txt
diff --git a/conf/test_full.config b/conf/test_full.config
@@ -0,0 +1,28 @@
+/*
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    Nextflow config file for running full-size tests
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    Defines input files and everything required to run a full size pipeline test.
+
+    Use as follows:
+        nextflow run plant-food-research-open/genepal -profile test_full,<docker/singularity> --outdir <OUTDIR>
+
+----------------------------------------------------------------------------------------
+*/
+
+params {
+    config_profile_name         = 'Full test profile'
+    config_profile_description  = 'Full test dataset to check pipeline function'
+
+    // Tested with these minimal resources
+    max_cpus   = 8
+    max_memory = '32.GB'
+    max_time   = '6.h'
+
+    // Input data
+    input                   = "${projectDir}/tests/minimal/assemblysheet.csv"
+    protein_evidence        = 'https://raw.githubusercontent.com/Gaius-Augustus/BRAKER/f58479fe5bb13a9e51c3ca09cb9e137cab3b8471/example/proteins.fa'
+
+    braker_extra_args       = '--gm_max_intergenic 10000 --skipOptimize' // Added for faster test execution! Do not use with actual data!
+    busco_lineage_datasets  = 'eudicots_odb10'
+}