MULTICOM3 is an add-on package to improve AlphaFold2- and AlphaFold-Multimer-based prediction of protein tertiary and quaternary structures by diverse multiple sequence alignment sampling, template identification, structural prediction evaluation and structural prediction refinement. It can improve AlphaFold2-based tertiary structure prediction by 8-10% and AlphaFold-Multimer-based quaternary structure prediction by 5-8%. In CASP15, MULTICOM3 used AlphaFold v2.2.0 as the engine to generate structural predictions. In this release, it is adjusted to run on top of AlphaFold v2.3.2 (https://github.com/deepmind/alphafold/releases/tag/v2.3.2) to leverage the latest improvement on AlphaFold2. You can install MULTICOM3 on top of your AlphaFold2 and AlphaFold-Multimer to improve both the tertiary structure prediction of monomers and the quaternary structure prediction of multimers.
git clone --recursive https://github.com/BioinfoMachineLearning/MULTICOM3
Note: ideally, a computer with a NVIDIA V100 or better GPU, 40 or more GB GPU memory, and 85 GB or more RAM is needed to run the system.
Installation (Docker version, modified from alphafold v2.3.2)
-
Install Docker.
- Install NVIDIA Container Toolkit for GPU support.
- Setup running Docker as a non-root user.
-
Check that AlphaFold will be able to use a GPU by running:
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
The output of this command should show a list of your GPUs. If it doesn't, check if you followed all steps correctly when setting up the NVIDIA Container Toolkit or take a look at the following NVIDIA Docker issue.
If you wish to run AlphaFold using Singularity (a common containerization platform on HPC systems) we recommend using some of the third-party Singularity setups as linked in google-deepmind/alphafold#10 or google-deepmind/alphafold#24.
-
Build the Docker image:
docker build -f docker/Dockerfile -t multicom3 .
If you encounter the following error:
W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease' is not signed.
use the workaround described in google-deepmind/alphafold#463 (comment).
-
Install the
run_docker.py
dependencies. Note: You may optionally wish to create a Python Virtual Environment to prevent conflicts with your system's Python environment.conda create -n docker python=3.8 conda activate docker pip3 install -r docker/requirements.txt conda install -c conda-forge -c bioconda mmseqs2=14.7e284 -y
-
Make sure that the output directory exists (the default is
/tmp/multicom3
) and that you have sufficient permissions to write into it. -
Download Genetic databases in AlphaFold2/AlphaFold-Multimer
bash $MULTICOM3_INSTALL_DIR/tools/alphafold-v2.3.2/scripts/download_all_data.sh <YOUR_ALPHAFOLD_DB_DIR>
Note: The download directory
<YOUR_ALPHAFOLD_DB_DIR>
should not be a subdirectory in the MULTICOM3 repository directory. If it is, the Docker build will be slow as the large databases will be copied during the image creation. -
Download additional genetic databases and tools in MULTICOM3
python download_database_and_tools.py --multicom3db_dir <YOUR_MULTICOM3_DB_DIR>
Note: The download directory
<YOUR_MULTICOM3_DB_DIR>
should not be a subdirectory in the MULTICOM3 repository directory. If it is, the Docker build will be slow as the large databases will be copied during the image creation.
Install AlphaFold/AlphaFold-Multimer and other required third-party packages (modified from alphafold_non_docker)
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && bash Miniconda3-latest-Linux-x86_64.sh
conda create --name multicom3 python==3.8
conda update -n base conda
conda activate multicom3
- Change
cudatoolkit==11.2.2
version if it is not supported in your system
conda install -y -c conda-forge openmm==7.5.1 cudatoolkit==11.2.2 pdbfixer
conda install -y -c bioconda hmmer hhsuite==3.3.0 kalign2
- Change
jaxlib==0.3.25+cuda11.cudnn805
version if this is not supported in your system
pip install absl-py==1.0.0 biopython==1.79 chex==0.0.7 dm-haiku==0.0.9 dm-tree==0.1.6 immutabledict==2.0.0 jax==0.3.25 ml-collections==0.1.0 numpy==1.21.6 pandas==1.3.4 protobuf==3.20.1 scipy==1.7.0 tensorflow-cpu==2.9.0
pip install --upgrade --no-cache-dir jax==0.3.25 jaxlib==0.3.25+cuda11.cudnn805 -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
# Replace $MULTICOM3_INSTALL_DIR with your MULTICOM3 installation directory
wget -q -P $MULTICOM3_INSTALL_DIR/tools/alphafold-v2.3.2/alphafold/common/ https://git.scicore.unibas.ch/schwede/openstructure/-/raw/7102c63615b64735c4941278d92b554ec94415f8/modules/mol/alg/src/stereo_chemical_props.txt
# Replace $MULTICOM3_INSTALL_DIR with your MULTICOM3 installation directory
cd ~/anaconda3/envs/multicom3/lib/python3.8/site-packages/ && patch -p0 < $MULTICOM3_INSTALL_DIR/tools/alphafold-v2.3.2/docker/openmm.patch
# or
cd ~/miniconda3/envs/multicom3/lib/python3.8/site-packages/ && patch -p0 < $MULTICOM3_INSTALL_DIR/tools/alphafold-v2.3.2/docker/openmm.patch
conda install tqdm
conda install -c conda-forge -c bioconda mmseqs2=14.7e284 -y
# Replace $MULTICOM3_INSTALL_DIR with your MULTICOM3 installation directory
bash $MULTICOM3_INSTALL_DIR/tools/alphafold-v2.3.2/scripts/download_all_data.sh <YOUR_ALPHAFOLD_DB_DIR>
# Note: here the parameters should be the absolute paths
python download_database_and_tools.py --multicom3db_dir <YOUR_MULTICOM3_DB_DIR>
# Configure the MULTICOM3 system
# Replace $MULTICOM3_INSTALL_DIR with your MULTICOM3 installation directory
# Replace $YOUR_ALPHAFOLD_DB_DIR with your downloaded AlphaFold databases directory
python configure.py --envdir ~/miniconda3/envs/multicom3 --multicom3db_dir <YOUR_MULTICOM3_DB_DIR> --afdb_dir <YOUR_ALPHAFOLD_DB_DIR>
# e.g,
# python download_database_and_tools.py \
# --multicom3db_dir /home/multicom3/tools/multicom3_db
# python configure.py \
# --multicom3db_dir /home/multicom3/tools/multicom3_db \
# --afdb_dir /home/multicom3/tools/alphafold_databases/
The configure.py python script will
- Copy the alphafold_addon scripts
- Create the configuration file (bin/db_option) for running the system
Assume the following databases have been installed as a part of the AlphaFold2/AlphaFold-Multimer installation
Additional databases will be installed for the MULTICOM system by setup.py:
- AlphaFoldDB: ~53G
- ColabFold database: ~1.7T
- Integrated Microbial Genomes (IMG): ~1.5T
- Metaclust: ~114G
- STRING: ~129G
- pdb_complex: ~38G
- pdb_sort90: ~48G
- Uniclust30: ~87G
# AlphaFold2 parameters
monomer_num_ensemble = 1
monomer_num_recycle = 3
num_monomer_predictions_per_model = 1
monomer_model_preset = monomer
# AlphaFold-Multimer parameters
multimer_num_ensemble = 1
multimer_num_recycle = 3
num_multimer_predictions_per_model = 5
multimer_model_preset = multimer
# Common parameters
alphafold_benchmark = True
use_gpu_relax = True
models_to_relax = ALL # ALL, BEST, NONE
max_template_date = 2024-06-01
Please refer to AlphaFold2 to understand the meaning of the parameters. The parameter values stored in bin/db_option file are applied to all the AlphaFold2/AlphaFold-Multimer variants in the MULTICOM3 system to generate predictions.
For Docker version installation, you can change the default parameter values in docker/db_option.
For non Docker version of the installation, the default bin/db_option file is created automatically by configure.py during the installation. The default parameter values above can be changed if needed.
conda activate multicom3
# Replace $MULTICOM3_INSTALL_DIR with your MULTICOM3 installation directory (absolute path)
export PYTHONPATH=$MULTICOM3_INSTALL_DIR
# e.g,
# conda activate MULTICOM3
# export PYTHONPATH=/home/multicom3/MULTICOM3
Now MULTICOM3 is ready for you to make predictions.
Say we have a monomer with the sequence <SEQUENCE>
. The input sequence file should be in the FASTA format as follows:
>sequence_name
<SEQUENCE>
Note: It is recommended that the name of the sequence file in FASTA format should be the same as the sequence name.
Then run the following command:
# Please provide absolute path for the input parameters
# docker version
python3 docker/run_docker.py \
--mode=monomer \
--option_file=docker/db_option \
--fasta_path=$YOUR_FASTA \
--run_img=False \
--af_db_dir=$YOUR_ALPHAFOLD_DB_DIR \
--multicom3_db_dir=$YOUR_MULTICOM3_DB_DIR \
--output_dir=$OUTDIR
# non docker version
python bin/monomer.py \
--option_file=bin/db_option \
--fasta_path=$YOUR_FASTA \
--run_img=False \
--output_dir=$OUTDIR
option_file is a file in the MULTICOM package to store some key parameter values for AlphaFold2 and AlphaFold-Multimer. fasta_path is the full path of the file storing the input protein sequence(s) in the FASTA format. output_dir specifies where the prediction results are stored. Please be aware that we have included a parameter (--run_img) that allows you to turn off the usage of the IMG database for faster prediction (--run_img=False). In the case of --run_img=True, the program will pause at the monomer prediction generation stage to wait for the IMG alignment to be created. Generating alignments from IMG may take a much longer time, potentially several days, because the database is very large. So run_img is set to false by default. It is advised that run_img is set to true only if other alignments cannot yield good results.
$OUTPUT_DIR/ # Your output directory
N1_monomer_alignments_generation/ # Working directory for generating monomer MSAs
N1_monomer_alignments_generation_img/ # Working directory for generating IMG MSA
# Note: the img.running file may use many disk space
N2_monomer_template_search/ # Working directory for searching monomer templates
N3_monomer_structure_generation/ # Working directory for generating monomer structural predictions
N4_monomer_structure_evaluation/ # Working directory for evaluating the monomer structural predictions
- alphafold_ranking.csv # AlphaFold2 pLDDT ranking
- pairwise_ranking.tm # Pairwise (APOLLO) ranking
- pairwise_af_avg.ranking # Average ranking of the two
N5_monomer_structure_refinement_avg/ # Working directory for monomer structure refinement
N5_monomer_structure_refinement_avg_final/ # Output directory for the refined monomer predictions
- final_ranking.csv # AlphaFold2 pLDDT ranking of the original and refined predictions
-
The predictions and ranking files are saved in the N4_monomer_structure_evaluation folder. You can check the AlphaFold2 pLDDT score ranking file (alphafold_ranking.csv) to look for the structure with the highest pLDDT score. The pairwise_ranking.tm and pairwise_af_avg.ranking are the other two ranking files.
-
The refined monomer predictions are saved in N5_monomer_structure_refinement_avg_final.
Say we have a homomer with 4 copies of the same sequence
<SEQUENCE>
. The input file should be in the format as follows:
>sequence_1
<SEQUENCE>
>sequence_2
<SEQUENCE>
>sequence_3
<SEQUENCE>
>sequence_4
<SEQUENCE>
Then run the following command:
# Please provide absolute path for the input parameters
# docker version
python3 docker/run_docker.py \
--mode=homomer \
--option_file=docker/db_option \
--fasta_path=$YOUR_FASTA \
--run_img=False \
--af_db_dir=$YOUR_ALPHAFOLD_DB_DIR \
--multicom3_db_dir=$YOUR_MULTICOM3_DB_DIR \
--output_dir=$OUTDIR
# non docker version
python bin/homomer.py \
--option_file=bin/db_option \
--fasta_path=$YOUR_FASTA \
--run_img=False \
--output_dir=$OUTDIR
Say we have an A2B3 heteromer, i.e. with 2 copies of
<SEQUENCE A>
and 3 copies of <SEQUENCE B>
. The input file should be in the format as follows (the same sequences should be grouped together):
>sequence_1
<SEQUENCE A>
>sequence_2
<SEQUENCE A>
>sequence_3
<SEQUENCE B>
>sequence_4
<SEQUENCE B>
>sequence_5
<SEQUENCE B>
Then run the following command:
# Please provide absolute path for the input parameters
# docker version
python3 docker/run_docker.py \
--mode=heteromer \
--option_file=docker/db_option \
--fasta_path=$YOUR_FASTA \
--run_img=False \
--af_db_dir=$YOUR_ALPHAFOLD_DB_DIR \
--multicom3_db_dir=$YOUR_MULTICOM3_DB_DIR \
--output_dir=$OUTDIR
# non docker version
python bin/heteromer.py \
--option_file=bin/db_option \
--fasta_path=$YOUR_FASTA \
--run_img=False \
--output_dir=$OUTDIR
$OUTPUT_DIR/ # Your output directory
N1_monomer_alignments_generation/ # Working directory for generating monomer MSAs
- Subunit A
- Subunit B
- ...
N1_monomer_alignments_generation_img/ # Working directory for generating IMG MSA
- Subunit A
- Subunit B
- ...
N2_monomer_template_search/ # Working directory for searching monomer templates
- Subunit A
- Subunit B
- ...
N3_monomer_structure_generation/ # Working directory for generating monomer structural predictions
- Subunit A
- Subunit B
- ...
N4_monomer_alignments_concatenation/ # Working directory for concatenating the monomer MSAs
N5_monomer_templates_search/ # Working directory for concatenating the monomer templates
N6_multimer_structure_generation/ # Working directory for generating multimer structural predictions
N7_monomer_structure_evaluation # Working directory for evaluating monomer structural predictions
- Subunit A
# Rankings for all the predictions
- alphafold_ranking.csv # AlphaFold2 pLDDT ranking
- pairwise_ranking.tm # Pairwise (APOLLO) ranking
- pairwise_af_avg.ranking # Average ranking of the two
# Rankings for the predictions generated by monomer structure prediction
- alphafold_ranking_monomer.csv # AlphaFold2 pLDDT ranking
- pairwise_af_avg_monomer.ranking # Average ranking
# Rankings for the predictions extracted from multimer predictions
- alphafold_ranking_multimer.csv # AlphaFold2 pLDDT ranking
- pairwise_af_avg_multimer.ranking # Average ranking
- Subunit B
- ...
N8_multimer_structure_evaluation # Working directory for evaluating multimer structural predictions
- alphafold_ranking.csv # AlphaFold2 pLDDT ranking
- multieva.csv # Pairwise ranking using MMalign
- pairwise_af_avg.ranking # Average ranking of the two
N9_multimer_structure_refinement # Working directory for refining multimer structural predictions
N9_multimer_structure_refinement_final # Output directory for the refined multimer predictions
-
The predictions and ranking files are saved in N8_multimer_structure_evaluation, similarly, you can check the AlphaFold-Multimer confidence score ranking file (alphafold_ranking.csv) to look for the structure with the highest predicted confidence score generated by AlphaFold-Multimer. The multieva.csv and pairwise_af_avg.ranking are the other two ranking files.
-
The refined multimer predictions are saved in N9_multimer_structure_refinement_final.
-
The monomer structures and ranking files are saved in N7_monomer_structure_evaluation if you want to check the predictions and rankings for the monomer structures.
If you use this package for tertiary or quaternary structure prediction, please cite:
Tertiary (monomer) structure prediction
Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., ... & Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 583-589.
Liu, J., Guo, Z., Wu, T., Roy, R. S., Chen, C., & Cheng, J. (2023). Improving AlphaFold2-based protein tertiary structure prediction with MULTICOM in CASP15. Communications Chemistry, 6(1), 188. (https://www.nature.com/articles/s42004-023-00991-6)
Quaternary (multimer) structure prediction
Evans, R., O’Neill, M., Pritzel, A., Antropova, N., Senior, A., Green, T., ... & Hassabis, D. (2021). Protein complex prediction with AlphaFold-Multimer. BioRxiv, 2021-10.
Liu, J., Guo, Z., Wu, T., Roy, R. S., Quadir, F., Chen, C., & Cheng, J. (2023). Enhancing alphafold-multimer-based protein complex structure prediction with MULTICOM in CASP15. Communications Biology, 6(1), 1140. (https://www.nature.com/articles/s42003-023-05525-3)