Skip to content

Commit

Permalink
Updated parameter section labels
Browse files Browse the repository at this point in the history
  • Loading branch information
GallVp committed Oct 4, 2024
1 parent 0c0269c commit 6ec662c
Show file tree
Hide file tree
Showing 7 changed files with 155 additions and 116 deletions.
5 changes: 3 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## 0.4.0+dev - [30-Sep-2024]
## 0.4.0+dev - [04-Oct-2024]

### `Added`

Expand All @@ -30,7 +30,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
21. Now `REPEATMASKER` GFF output is saved via `CUSTOM_RMOUTTOGFF3` [#54](https://github.com/plant-food-research-open/genepal/issues/54)
22. Added `benchmark` column to the input sheet and used `GFFCOMPARE` to perform benchmarking [#63](https://github.com/plant-food-research-open/genepal/issues/63)
23. Added `SEQKIT_RMDUP` to detect duplicate sequence and wrap the fasta to 80 characters
24. Updated modules and sub-workflows
24. Updated parameter section labels for annotation and post-annotation filtering [#64](https://github.com/plant-food-research-open/genepal/issues/64)
25. Updated modules and sub-workflows

### `Fixed`

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
- [REPEATMODELER](https://github.com/Dfam-consortium/RepeatModeler) or [EDTA](https://github.com/oushujun/EDTA): Create TE library
- [REPEATMASKER](https://github.com/rmhubley/RepeatMasker): Soft mask the genome fasta
- [FASTQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc), [FASTP](https://github.com/OpenGene/fastp), [SORTMERNA](https://github.com/sortmerna/sortmerna): QC, trim and filter RNASeq evidence
- [STAR](https://github.com/alexdobin/STAR): RNAseq alignment
- [STAR](https://github.com/alexdobin/STAR): RNASeq alignment
- [BRAKER](https://github.com/Gaius-Augustus/BRAKER): Annotate the genome fasta
- [LIFTOFF](https://github.com/agshumate/Liftoff): Liftoff annotations from reference genome fasta/gff
- [TSEBRA](https://github.com/Gaius-Augustus/TSEBRA), [AGAT](https://github.com/NBISweden/AGAT): Merge BRAKER and Liftoff annotations
Expand Down
12 changes: 6 additions & 6 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d

- [Repeat annotation](#repeat-annotation)
- [Repeat masking](#repeat-masking)
- [RNAseq trimming, filtering and QC](#rnaseq-trimming-filtering-and-qc)
- [RNAseq alignment](#rnaseq-alignment)
- [RNASeq trimming, filtering and QC](#rnaseq-trimming-filtering-and-qc)
- [RNASeq alignment](#rnaseq-alignment)
- [Annotation with BRAKER](#annotation-with-braker)
- [Annotation with Liftoff](#annotation-with-liftoff)
- [Annotation filtering and merging](#annotation-filtering-and-merging)
Expand Down Expand Up @@ -50,7 +50,7 @@ A repeat library is created with either [REPEATMODELER](https://github.com/Dfam-

Soft masking of the repeats is performed with [REPEATMASKER](https://github.com/rmhubley/RepeatMasker) using the repeat library prepared in the previous step. Masking outputs are saved to the output directory only if `repeatmasker_save_outputs` parameter is set to `true` (default: `false`).

### RNAseq trimming, filtering and QC
### RNASeq trimming, filtering and QC

<details markdown="1">
<summary>Output files</summary>
Expand Down Expand Up @@ -79,7 +79,7 @@ Soft masking of the repeats is performed with [REPEATMASKER](https://github.com/

RNASeq reads are trimmed with [FASTP](https://github.com/OpenGene/fastp) and are QC'ed with [FASTQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc). Ribosomal reads are filtered out using [SORTMERNA](https://github.com/sortmerna/sortmerna). Trimmed reads are only stored to the output directory if the `save_trimmed` parameter is set to `true` (default: `false`). Reads filtered by [SORTMERNA](https://github.com/sortmerna/sortmerna) are stored to the output directory if the `save_non_ribo_reads` parameter is set to `true` (default: `false`).

### RNAseq alignment
### RNASeq alignment

<details markdown="1">
<summary>Output files</summary>
Expand All @@ -93,7 +93,7 @@ RNASeq reads are trimmed with [FASTP](https://github.com/OpenGene/fastp) and are

</details>

RNAseq alignment is performed with [STAR](https://github.com/alexdobin/STAR). Alignment files are only stored to the output directory if the `star_save_outputs` parameter is set to `true` (default: `false`). Concatenated bam files are stored to the output directory if the `save_cat_bam` parameter is set to `true` (default: `false`).
RNASeq alignment is performed with [STAR](https://github.com/alexdobin/STAR). Alignment files are only stored to the output directory if the `star_save_outputs` parameter is set to `true` (default: `false`). Concatenated bam files are stored to the output directory if the `save_cat_bam` parameter is set to `true` (default: `false`).

### Annotation with BRAKER

Expand All @@ -112,7 +112,7 @@ RNAseq alignment is performed with [STAR](https://github.com/alexdobin/STAR). Al

</details>

[BRAKER](https://github.com/Gaius-Augustus/BRAKER) is used to annotate each genome assembly using the provide protein and RNAseq evidence. Outputs from BRAKER are stored to the output directory if the `braker_save_outputs` parameter is set to `true` (default: `false`).
[BRAKER](https://github.com/Gaius-Augustus/BRAKER) is used to annotate each genome assembly using the provide protein and RNASeq evidence. Outputs from BRAKER are stored to the output directory if the `braker_save_outputs` parameter is set to `true` (default: `false`).

> [!CAUTION]
>
Expand Down
38 changes: 24 additions & 14 deletions docs/parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ A Nextflow pipeline for single genome, multiple genomes and pan-genome annotatio
| `save_non_ribo_reads` | Save FASTQ files after Ribosomal RNA removal or not? | `boolean` | | | |
| `ribo_database_manifest` | Ribosomal RNA fastas listed in a text sheet | `string` | ${projectDir}/assets/rrna-db-defaults.txt | | |

## RNAseq alignment options
## RNASeq alignment options

| Parameter | Description | Type | Default | Required | Hidden |
| ------------------------ | ------------------------------------------------- | --------- | ------- | -------- | ------ |
Expand All @@ -48,19 +48,29 @@ A Nextflow pipeline for single genome, multiple genomes and pan-genome annotatio

## Annotation options

| Parameter | Description | Type | Default | Required | Hidden |
| ----------------------------- | --------------------------------------------------------------------------------- | --------- | ------- | -------- | ------ |
| `braker_extra_args` | Extra arguments for BRAKER | `string` | | | |
| `braker_save_outputs` | Save BRAKER files | `boolean` | | | |
| `liftoff_coverage` | Liftoff coverage parameter | `number` | 0.9 | | |
| `liftoff_identity` | Liftoff identity parameter | `number` | 0.9 | | |
| `allow_isoforms` | Allow multiple isoforms for gene models | `boolean` | True | | |
| `enforce_full_intron_support` | Require every model to have external evidence for all its introns | `boolean` | True | | |
| `filter_liftoff_by_hints` | Use BRAKER hints to filter Liftoff models | `boolean` | True | | |
| `eggnogmapper_evalue` | Only report alignments below or equal the e-value threshold | `number` | 1e-05 | | |
| `eggnogmapper_pident` | Only report alignments above or equal to the given percentage of identity (0-100) | `integer` | 35 | | |
| `eggnogmapper_purge_nohits` | Purge transcripts which do not have a hit against eggnog | `boolean` | | | |
| `add_attrs_to_proteins_fasta` | Add gff attributes to proteins fasta | `boolean` | | | |
| Parameter | Description | Type | Default | Required | Hidden |
| --------------------- | --------------------------------------------------------------------------------- | --------- | ------- | -------- | ------ |
| `braker_extra_args` | Extra arguments for BRAKER | `string` | | | |
| `liftoff_coverage` | Liftoff coverage parameter | `number` | 0.9 | | |
| `liftoff_identity` | Liftoff identity parameter | `number` | 0.9 | | |
| `eggnogmapper_evalue` | Only report alignments below or equal the e-value threshold | `number` | 1e-05 | | |
| `eggnogmapper_pident` | Only report alignments above or equal to the given percentage of identity (0-100) | `integer` | 35 | | |

## Post-annotation filtering options

| Parameter | Description | Type | Default | Required | Hidden |
| ----------------------------- | ----------------------------------------------------------------- | --------- | ------- | -------- | ------ |
| `allow_isoforms` | Allow multiple isoforms for gene models | `boolean` | True | | |
| `enforce_full_intron_support` | Require every model to have external evidence for all its introns | `boolean` | True | | |
| `filter_liftoff_by_hints` | Use BRAKER hints to filter Liftoff models | `boolean` | True | | |
| `eggnogmapper_purge_nohits` | Purge transcripts which do not have a hit against eggnog | `boolean` | | | |

## Annotation output options

| Parameter | Description | Type | Default | Required | Hidden |
| ----------------------------- | ------------------------------------ | --------- | ------- | -------- | ------ |
| `braker_save_outputs` | Save BRAKER files | `boolean` | | | |
| `add_attrs_to_proteins_fasta` | Add gff attributes to proteins fasta | `boolean` | | | |

## Evaluation options

Expand Down
16 changes: 8 additions & 8 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
- [Assemblysheet input](#assemblysheet-input)
- [Protein evidence](#protein-evidence)
- [BRAKER workflow](#braker-workflow)
- [RNAseq evidence](#rnaseq-evidence)
- [RNASeq evidence](#rnaseq-evidence)
- [BRAKER workflow](#braker-workflow-1)
- [Preprocessing](#preprocessing)
- [Alignment](#alignment)
Expand Down Expand Up @@ -52,11 +52,11 @@ Protein evidence can be provided in two ways. First, a single FASTA file. Second

With these two parameters, the pipeline has sufficient inputs to execute the [BRAKER workflow C](https://github.com/Gaius-Augustus/BRAKER/tree/f58479fe5bb13a9e51c3ca09cb9e137cab3b8471?tab=readme-ov-file#overview-of-modes-for-running-braker) (see Figure 4) in which GeneMark-EP+ is trained on protein spliced alignments, then GeneMark-EP+ generates training data for AUGUSTUS which then performs the final gene prediction.

## RNAseq evidence
## RNASeq evidence

> ❔ Optional `--rna_evidence`
RNAseq evidence must be provided through a samplesheet in CSV format which has the following columns,
RNASeq evidence must be provided through a samplesheet in CSV format which has the following columns,

- `sample:` A sample identifier. The `sample` identifiers have to be the same when you have re-sequenced the same sample more than once e.g. to increase sequencing depth. The pipeline will concatenate the raw reads before performing any downstream analysis.
- `file_1:` A FASTQ or BAM file
Expand All @@ -65,17 +65,17 @@ RNAseq evidence must be provided through a samplesheet in CSV format which has t

### BRAKER workflow

If RNAseq evidence is provided, the pipeline executes the [BRAKER workflow D](https://github.com/Gaius-Augustus/BRAKER/tree/f58479fe5bb13a9e51c3ca09cb9e137cab3b8471?tab=readme-ov-file#overview-of-modes-for-running-braker) (see Figure 4) in which GeneMark-ETP is trained with both protein and RNASeq evidence and the training data generated by GeneMark-ETP is used to optimise AUGUSTUS for final gene predictions.
If RNASeq evidence is provided, the pipeline executes the [BRAKER workflow D](https://github.com/Gaius-Augustus/BRAKER/tree/f58479fe5bb13a9e51c3ca09cb9e137cab3b8471?tab=readme-ov-file#overview-of-modes-for-running-braker) (see Figure 4) in which GeneMark-ETP is trained with both protein and RNASeq evidence and the training data generated by GeneMark-ETP is used to optimise AUGUSTUS for final gene predictions.

### Preprocessing

RNAseq reads provided in FASTQ files are by default trimmed with [FASTP](https://github.com/OpenGene/fastp). No parameters are provided by default. Although, additional parameters can be provided with `--extra_fastp_args` parameter. After trimming, any sample which does not have `10000` reads left is dropped. This threshold can be specified with the `--min_trimmed_reads` parameter. If trimming was already performed ot it is not desirable, it can be skipped by setting the `--skip_fastp` flag to `true`.
RNASeq reads provided in FASTQ files are by default trimmed with [FASTP](https://github.com/OpenGene/fastp). No parameters are provided by default. Although, additional parameters can be provided with `--extra_fastp_args` parameter. After trimming, any sample which does not have `10000` reads left is dropped. This threshold can be specified with the `--min_trimmed_reads` parameter. If trimming was already performed ot it is not desirable, it can be skipped by setting the `--skip_fastp` flag to `true`.

Optionally, [SORTMERNA](https://github.com/sortmerna/sortmerna) can be activated by setting the `--remove_ribo_rna` flag to `true`. A default list of rRNA databases is pre-configured and can be seen in the [assets/rrna-db-defaults.txt](../assets/rrna-db-defaults.txt) file. A path to a custom list of databases can be specified by the `--ribo_database_manifest` parameter.

### Alignment

RNAseq evidence provided as FASTQ files is aligned using [STAR](https://github.com/alexdobin/STAR). The default alignment parameters are,
RNASeq evidence provided as FASTQ files is aligned using [STAR](https://github.com/alexdobin/STAR). The default alignment parameters are,

```bash
--outSAMstrandField intronMotif \
Expand All @@ -88,7 +88,7 @@ where `--star_max_intron_length` is a pipeline parameter and its default value i

> [!WARNING]
>
> If pre-aligned RNAseq data is provided as a BAM file and the alignment was not performed with `--outSAMstrandField intronMotif` parameter, the pipeline might trough an error.
> If pre-aligned RNASeq data is provided as a BAM file and the alignment was not performed with `--outSAMstrandField intronMotif` parameter, the pipeline might trough an error.
## Liftoff annotations

Expand Down Expand Up @@ -156,7 +156,7 @@ If there are more than one target assemblies, an orthology inference is performe

## Iso-forms and full intron support

By default the pipeline allows multiple isoforms from BRAKER. This behavior can be changed by setting the `--allow_isoforms` flag to `false`. Moreover, every intron from every model from BRAKER and LIFTOFF must have support from protein or RNAseq evidence. This is enforced with [TSEBRA](https://github.com/Gaius-Augustus/TSEBRA). This requirement can be removed by setting the `--enforce_full_intron_support` flag to `false`. Or, selectively only applying this criterion to BRAKER models by setting the `--filter_liftoff_by_hints` flag to `false`.
By default the pipeline allows multiple isoforms from BRAKER. This behavior can be changed by setting the `--allow_isoforms` flag to `false`. Moreover, every intron from every model from BRAKER and LIFTOFF must have support from protein or RNASeq evidence. This is enforced with [TSEBRA](https://github.com/Gaius-Augustus/TSEBRA). This requirement can be removed by setting the `--enforce_full_intron_support` flag to `false`. Or, selectively only applying this criterion to BRAKER models by setting the `--filter_liftoff_by_hints` flag to `false`.

## Running the pipeline

Expand Down
Loading

0 comments on commit 6ec662c

Please sign in to comment.