Add RoseTTAFold-All-Atom #220

jscgh · 2024-11-20T05:19:10Z

Adds RoseTTAFold-All-Atom as a module per #197.

PR checklist

This comment contains a description of changes (with reason).
If you've added a new tool - have you followed the pipeline conventions in the contribution docs
Make sure your code lints (nf-core lint).
Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
Usage Documentation in docs/usage.md is updated.
Output Documentation in docs/output.md is updated.
CHANGELOG.md is updated.
README.md is updated (including new tool citations and authors/contributors).

… scheduler

…eate-katana-config Create Katana config

…d-labels-to-gpu-processes Add labels to GPU processes

…d-labels-to-gpu-processes Add GPU Compute label

…d-labels-to-gpu-processes Remove GPU Compute label from CPU pipeline

modified: conf/dbs.config modified: modules/local/run_alphafold2.nf Added variable links in dbs.config and run_alphafold2.nf

…tabase Updated database links

…d-testing-files Add testing files

…uster tooling is incomlete

Sync master branch

…d-dbs-variables-to-msa-pipeline Make pipeline work on UNSW Katana

…d of hardcoded values

…dated variable names

…atom

…enearte_report.py for visualisation.

…atom

JoseEspinosa

Just left some minor suggestions but otherwise LGTM Awesome job @jscgh and @nbtm-sh! 🚀

JoseEspinosa · 2025-02-14T10:16:08Z

README.md

@@ -53,7 +55,7 @@ nextflow run nf-core/proteinfold \
   --outdir <OUTDIR>
 ```

-The pipeline takes care of downloading the databases and parameters required by AlphaFold2, Colabfold or ESMFold. In case you have already downloaded the required files, you can skip this step by providing the path to the databases using the corresponding parameter [`--alphafold2_db`], [`--colabfold_db`] or [`--esmfold_db`]. Please refer to the [usage documentation](https://nf-co.re/proteinfold/usage) to check the directory structure you need to provide for each of the databases.
+The pipeline takes care of downloading the databases and parameters required by AlphaFold2, Colabfold or ESMFold. In case you have already downloaded the required files, you can skip this step by providing the path to the databases using the corresponding parameter [`--alphafold2_db`], [`--colabfold_db`], [`--esmfold_db`] or ['--rosettafold_all_atom_db']. Please refer to the [usage documentation](https://nf-co.re/proteinfold/usage) to check the directory structure you need to provide for each of the databases.


Suggested change

The pipeline takes care of downloading the databases and parameters required by AlphaFold2, Colabfold or ESMFold. In case you have already downloaded the required files, you can skip this step by providing the path to the databases using the corresponding parameter [`--alphafold2_db`], [`--colabfold_db`], [`--esmfold_db`] or ['--rosettafold_all_atom_db']. Please refer to the [usage documentation](https://nf-co.re/proteinfold/usage) to check the directory structure you need to provide for each of the databases.

The pipeline takes care of downloading the databases and parameters required by AlphaFold2, Colabfold, ESMFold or RoseTTAFold-All-Atom. In case you have already downloaded the required files, you can skip this step by providing the path to the databases using the corresponding parameter [`--alphafold2_db`], [`--colabfold_db`], [`--esmfold_db`] or ['--rosettafold_all_atom_db']. Please refer to the [usage documentation](https://nf-co.re/proteinfold/usage) to check the directory structure you must provide for each database.

JoseEspinosa · 2025-02-14T10:35:53Z

conf/modules_rosettafold_all_atom.config

+        ext.prefix = File name prefix for output files.
+----------------------------------------------------------------------------------------
+*/
+


Aren't we missing here the declaration to set the path to save the downloaded DBs and parameters? e.g. for af2:

proteinfold/conf/modules_alphafold2.config

Lines 18 to 23 in 1af71b4

withName: 'GUNZIP|COMBINE_UNIPROT|DOWNLOAD_PDBMMCIF|ARIA2_PDB_SEQRES' {

publishDir = [

path: {"${params.outdir}/DBs/alphafold2/${params.alphafold2_mode}"},

mode: 'symlink',

saveAs: { filename -> filename.equals('versions.yml') ? null : filename },

]

JoseEspinosa · 2025-02-14T10:40:38Z

conf/test_rosettafold_all_atom.config

+process {
+    withName: 'RUN_ROSETTAFOLD_ALL_ATOM' {
+        container = '/srv/scratch/sbf-pipelines/proteinfold/singularity/rosettafold_all_atom.sif'
+    }
+}


Suggested change

process {

withName: 'RUN_ROSETTAFOLD_ALL_ATOM' {

container = '/srv/scratch/sbf-pipelines/proteinfold/singularity/rosettafold_all_atom.sif'

}

}

process {

withName: 'RUN_ROSETTAFOLD_ALL_ATOM' {

container = 'biocontainers/gawk:5.1.0'

}

JoseEspinosa · 2025-02-14T10:44:45Z

dockerfiles/rosettafold_all_atom.def

@@ -0,0 +1,41 @@
+Bootstrap: docker


For the rest of modes, we are just providing the Dockerfiles and not the singularity definitions as we are not making them public. For consistency, we would need to create all the singularity definitions or delete this one. I would do the latter but we can discuss it if you think otherwise.

JoseEspinosa · 2025-02-14T10:46:21Z

dockerfiles/Dockerfile_nfcore-proteinfold_rosettafold_all_atom

+
+LABEL Author="[email protected]" \
+    title="nfcore/proteinfold_rosettafold_all_atom" \
+    Version="0.9.0" \


Suggested change

Version="0.9.0" \

Version="1.2.0dev" \

JoseEspinosa · 2025-02-14T11:07:04Z

nextflow.config

@@ -214,6 +229,7 @@ profiles {
    apptainer {
        apptainer.enabled       = true
        apptainer.autoMounts    = true
+        if (params.use_gpu) { apptainer.runOptions = '--nv' }


This will make the Nextflow language server happy

Suggested change

if (params.use_gpu) { apptainer.runOptions = '--nv' }

params.use_gpu ? '--nv' : apptainer.runOptions

JoseEspinosa · 2025-02-14T11:16:04Z

nextflow_schema.json

@@ -80,7 +80,6 @@
                },
                "full_dbs": {
                    "type": "boolean",
-                    "default": false,


This is set to false by default isn't it?

proteinfold/nextflow.config

Line 21 in 16fd780

full_dbs = false // true full_dbs, false reduced_dbs

Suggested change

"default": false,

"default": false,

JoseEspinosa · 2025-02-14T11:26:36Z

nextflow_schema.json

@@ -675,5 +688,42 @@
        {
            "$ref": "#/$defs/generic_options"
        }
-    ]
+    ],


For consistency, maybe create a rosettafold_all_all_atom_dbs_and_parameters_path_options and a rosettafold_all_all_atom_dbs_and_parameters_link_options definitions, actually for modifying the schema you can use nf-core pipelines schema build, see here, which provides a GUI in case it would be handy for you 😄

JoseEspinosa · 2025-02-14T11:31:55Z

workflows/rosettafold_all_atom.nf

+//
+// MODULE: Loaded from modules/local/
+//
+include { RUN_ROSETTAFOLD_ALL_ATOM      } from '../modules/local/run_rosettafold_all_atom'


Suggested change

include { RUN_ROSETTAFOLD_ALL_ATOM } from '../modules/local/run_rosettafold_all_atom'

include { RUN_ROSETTAFOLD_ALL_ATOM } from '../modules/local/run_rosettafold_all_atom'

JoseEspinosa · 2025-02-14T11:33:38Z

workflows/rosettafold_all_atom.nf

+    ch_pdb            = ch_pdb.mix(RUN_ROSETTAFOLD_ALL_ATOM.out.pdb)
+    ch_versions       = ch_versions.mix(RUN_ROSETTAFOLD_ALL_ATOM.out.versions)


Suggested change

ch_pdb = ch_pdb.mix(RUN_ROSETTAFOLD_ALL_ATOM.out.pdb)

ch_versions = ch_versions.mix(RUN_ROSETTAFOLD_ALL_ATOM.out.versions)

ch_pdb = ch_pdb.mix(RUN_ROSETTAFOLD_ALL_ATOM.out.pdb)

ch_versions = ch_versions.mix(RUN_ROSETTAFOLD_ALL_ATOM.out.versions)

nbtm-sh and others added 30 commits July 29, 2024 13:32

feat(katana.config): Created file katana.config

aa9be9f

feat(katana.config): Added params for PBS queues

678b212

feat(katana.config): Added executor parameter to allow the use Katana…

82466e0

… scheduler

feat(katana.config): Added label configs for pushing to GPU partition

8d2a771

Merge pull request #1 from Australian-Structural-Biology-Computing/cr…

1ad4996

…eate-katana-config Create Katana config

feat(run_alphafold2): Added 'gpu_compute' label to the Alphafold process

3c1fb28

Merge pull request #2 from Australian-Structural-Biology-Computing/ad…

67704b4

…d-labels-to-gpu-processes Add labels to GPU processes

feat(run_alphafold2_pred): Added 'gpu_compute' label

e8d2abb

Merge pull request #3 from Australian-Structural-Biology-Computing/ad…

2021623

…d-labels-to-gpu-processes Add GPU Compute label

revert(run_alphafold2.nf): Removed GPU compute label from pipeline

13dd1cb

Merge pull request #4 from Australian-Structural-Biology-Computing/ad…

b0d483e

…d-labels-to-gpu-processes Remove GPU Compute label from CPU pipeline

Updated database links

2444395

modified: conf/dbs.config modified: modules/local/run_alphafold2.nf Added variable links in dbs.config and run_alphafold2.nf

Merge pull request #6 from Australian-Structural-Biology-Computing/da…

4ce2e18

…tabase Updated database links

feat(pf_files): Added testing files

8b9452a

Merge branch 'unsw-dev' into add-testing-files

32311ba

Merge pull request #7 from Australian-Structural-Biology-Computing/ad…

e676c33

…d-testing-files Add testing files

fix(proteinfold_test.sh): Made path to main.nf rel

0962f91

revert(base.config): Changed executor back to local for testing as cl…

992d6d1

…uster tooling is incomlete

fix(proteinfold_test.sh): Changed mode to 'split_msa_production'

32d466c

Merge pull request #2 from nf-core/master

d135dc8

Sync master branch

fix(dbs.conf): Updated dbs.conf to work on UNSW infrastructure

b3140e7

fix(run_alphafold2_msa): Fixed incorrectly named files

4047e62

fix(run_alphafold2_pred): Fixed incorrectly named files

93513bc

fix(proteinfold_test.sh): Added singulairty argument

a007d5a

fix(samplesheet): Changed sample to a much smaller sample

232c8c9

fix(samplesheet): Changed sampel to a smaller sample

03f2575

Merge pull request #8 from Australian-Structural-Biology-Computing/ad…

632610b

…d-dbs-variables-to-msa-pipeline Make pipeline work on UNSW Katana

feat(conf/dbs): Added variables for database names, and file names

964f5d0

feat(conf/dbs): Changed config paths to use database variables instea…

2a79fe4

…d of hardcoded values

feat(run_alphafold2): Changed hardcoded paths to use variables and up…

c218ad2

…dated variable names

jscgh added 10 commits November 20, 2024 15:36

Aligned RoseTTAFold-All-Atom module to dev base

de835f9

Aligned RoseTTAFold-All-Atom module to dev base

ed1cf3a

Aligned RoseTTAFold-All-Atom module to dev base

11347c9

Fixed tests

d4c1cbc

Fixed tests

5fcf98b

Updated CHANGELOG started on other docs

0050033

Updated CHANGELOG started on other docs

b3582d7

Working multiqc for RFAA

6c62771

RFAA dbs will now be downloaded by prepare_rosettafold_all_atom_dbs.nf

aaafcd2

Added docs

b2616f6

jscgh changed the title ~~[WIP] Add RoseTTAFold-All-Atom~~ Add RoseTTAFold-All-Atom Nov 27, 2024

jscgh added 3 commits November 27, 2024 15:17

Resolved conflict

4144328

Merge remote-tracking branch 'upstream/dev' into add-rosettafold-all-…

fa6ef41

…atom

Merged with nf-core/dev

5939ce6

jscgh marked this pull request as ready for review November 27, 2024 04:22

jscgh and others added 14 commits November 27, 2024 15:25

Passed linting

362d06b

Improved multiqc processing to fix RFAA decimals and added model to g…

e1bcc66

…enearte_report.py for visualisation.

Removed unrelated add-helixfold file

4eef3db

Merge remote-tracking branch 'upstream/dev' into add-rosettafold-all-…

3dc1448

…atom

Updated RFAA definition file

dcaf470

modified: dockerfiles/rosettafold_all_atom.def

1dfe3a3

Added RFAA dockerfile

0883a33

Prettier

409da08

Updated container path to repo

ca466e6

Linted

3db3aa0

Fixed LD_LIBRARY_PATH in RFAA dockerfile

6f04a07

Linted

bf08a5b

Linted

150e449

Merge branch 'dev' into add-rosettafold-all-atom

16fd780

JoseEspinosa requested changes Feb 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add RoseTTAFold-All-Atom #220

Add RoseTTAFold-All-Atom #220

jscgh commented Nov 20, 2024 •

edited

Loading

JoseEspinosa left a comment

JoseEspinosa Feb 14, 2025

JoseEspinosa Feb 14, 2025

JoseEspinosa Feb 14, 2025

JoseEspinosa Feb 14, 2025

JoseEspinosa Feb 14, 2025

JoseEspinosa Feb 14, 2025

JoseEspinosa Feb 14, 2025

JoseEspinosa Feb 14, 2025

JoseEspinosa Feb 14, 2025

JoseEspinosa Feb 14, 2025

	withName: 'GUNZIP\|COMBINE_UNIPROT\|DOWNLOAD_PDBMMCIF\|ARIA2_PDB_SEQRES' {
	publishDir = [
	path: {"${params.outdir}/DBs/alphafold2/${params.alphafold2_mode}"},
	mode: 'symlink',
	saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
	]

	if (params.use_gpu) { apptainer.runOptions = '--nv' }
	params.use_gpu ? '--nv' : apptainer.runOptions

	include { RUN_ROSETTAFOLD_ALL_ATOM } from '../modules/local/run_rosettafold_all_atom'
	include { RUN_ROSETTAFOLD_ALL_ATOM } from '../modules/local/run_rosettafold_all_atom'

		ch_pdb = ch_pdb.mix(RUN_ROSETTAFOLD_ALL_ATOM.out.pdb)
		ch_versions = ch_versions.mix(RUN_ROSETTAFOLD_ALL_ATOM.out.versions)

Add RoseTTAFold-All-Atom #220

Are you sure you want to change the base?

Add RoseTTAFold-All-Atom #220

Conversation

jscgh commented Nov 20, 2024 • edited Loading

PR checklist

JoseEspinosa left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jscgh commented Nov 20, 2024 •

edited

Loading