Skip to content

Latest commit

 

History

History
86 lines (72 loc) · 11.7 KB

proteindesignDLmethods.md

File metadata and controls

86 lines (72 loc) · 11.7 KB

WIP

💡 Notes

  • This is a list accompanying our manuscript 'Title' (preprint, code). We focus on the last four years' deep learning methods for protein design. This table complements Table 1 presented in our manuscript.
  • We curated this list manually, and as such, it might be incomplete. Please drop us an email or open an issue if you find your method missing.
  • We order the methods by release date (preprint when available) and categorize them into four classes (for more details on these categories, see our preprint):
    • 1: 'fixed-backbone' protein design; p(sequence|structure)
    • 2: structure generation; p(structure)
    • 3: sequence generation; p(sequence) or p(sequence|sequence seed)
    • 4: concomitant protein and sequence design. p(sequence and structure) (which can be constrained).
  • Others before us have done an excellent work assembling other methods, sometimes overlapping with this list. We link these lists here:
  • We sort our lists chronologically

Class I: Protein Sequence design ("Fixed-backbone")

Methods in this class attempt to solve the classical protein design problem: Find an optimal sequence that adopts a pre-determined 3D structure.

Name Architecture Number of Parameters User Input Output Training Dataset Paper Code Release Month/Year
SPIN2 FNN ~105k 3D structure sequence 1,532 X-ray structures Paper Code used to be here - no longer available 2018/02
SPROF CNN-LSTM - 3D structure sequence 1,532 X-ray structures Paper Code Web Server 2018/02
ProteinSolver GNN - 3D structure  sequence  72,464,122 sequences/adjacency matrices pairs  Paper Code 2019/12
ProDCoNN CNN >28k 3D structure  sequence  26,179 sequences/PDB pairs?  Paper - 2019/12
Ingraham et al. modified Transformer >3k  sequence  CATH 4.2 40% sequences/structures  Paper Code 2019/12]
Anand et al. CNN - 3D structure  sequence 53,414 CATH domain structures   Paper Code 2020/01]
DenseCPD CNN 3M 3D structure  sequence 11,227 3D structures    Paper Web server 2020/01]
GVP GVP - 3D structure sequence  CATH 4.2 40% sequences/structures   Paper Code 2020/07
Norn et al. CNN N/A Distance map?  sequence  N/A  Paper Code 2020/07
Fold2Seq modified Transformer - 3D structure   sequence 45,995 3D structures from CATH 4.2  Paper Code 2021/06
CNN_protein_landscape CNN >10M 3D structure   sequence 16,569 PDB chains  Paper Code 2021/08
Orellana et al. GCN - 3D structures  sequence  CATH 4.2 40% sequences/structures  Paper - 2021/11
McPartlon et al. modified Transformer >10k 3D structures  sequences 37k 3D structures from BC40   Paper - 2022/04
ESM-IF1 GVP-Transformer 142M 3D structure  sequence 16k 3D structures + 1.2 M AF2 predictions  Paper Code 2022/04
ABACUS-R Transformer 152M 3D structures  sequence  CATH 4.2  Paper Code 2022/02
ProteinMPNN MPNN >28k 3D structure  sequence  CATH 4.2 40% sequences/structures  Paper Code Web Interface 2022/07
ProDESIGN-LE ? ?  sequence  ?  Paper - 2022/07
MIF SGNN ?  sequence  ?  Paper Code 2022/05

Class II: Structure generation

Methods in this class generate structures unconditionally or from a set of secondary structural conditions.

Name Architecture Number of Parameters User Input Output Training Dataset Paper Code Release Month/Year
64GAN GAN - - contact map (3D structure via ADMM) 427,659 contact maps Paper - 2018/12
Anand et al. GAN - - distance map (3D structure via CNN) 800,000 distance maps Paper 2019/03
RamaNet LSTM - - A sequence of φ and ψ angles 607 helical structures Paper Code 19/06
Ig-VAE VAE - - protein backbone coordinates 10,768 individual immunoglobulin domains Paper Code 2022/02
SCUBA NC-NN ~20k secondary structure motif backbone 12,465 structures Paper Code 2022/02
GENESIS VAE - secondary structure motif contact map 40,726 backbones with remodeled loops Paper - 2022/02
DECO-VAE VAE - ? contact graph (translatable to contact map) >650,000 contact graphs Paper - 2020/04

Class III: Sequence generation

Methods in this class generate sequences usually from autoregressive language models, and can sometimes be conditioned.

Name Architecture Number of Parameters User Input Output Training Dataset Paper Code Release Month/Year
ProteinGAN GAN 60M sequence 16,706 MDH sequences Paper Code 2019/03
ProGen Transformer 1.2B Optional: sequence or function sequence 280M sequences Paper Code 2020/03
ProtXLnet Transformer 409M Optional: sequence sequence UniRef100 Paper Code 2020/07
ProtXL Transformer 562M Optional: sequence sequence BFD100 Paper Code 2020/07
ProtElectra-Generator Transformer 420M Optional: sequence sequence UniRef100 Paper Code 2020/07
ProtT5 Transformer 3B Optional: sequence sequence UniRef100 Paper Code 2020/07
EVE VAE MSA Sequence 3,219 MSAs Paper Code 2020/12
arDCA one layer autoregressive model + logistic regression - Optional:sequence sequence 1,019,208 sequences Paper Code 2021/03
DARK3 Transformer 110M Optional: sequence sequence 615,000 sequences Paper 2022/01
ProtGPT2 Transformer 739M Optional: sequence sequence UniRef50 Paper Code 2022/03
RITA Transformer 1.2B Optional: sequence sequence UniRef50 Paper Code 2022/05
ProGEN2 Transformer 6.4B Optional: sequence sequence Paper Code 2022/06

Class IV: Sequence and structure design

Methods in this class generate sequences and structures concomitantly, and include hallucination methods and constrained generation (inpainting)

Name Architecture Number of Parameters User Input Output Training Dataset Paper Code Release Month/Year
Hallucination CNN (trRosetta) N/A random sequence sequence/structure N/A Paper Code 2020/07
Constrained hallucination CNN (trRosetta) N/A sequence/structure sequence/structure N/A Paper Code 2020/11
Constrained hallucination2 CNN (RoseTTAFold) N/A sequence/structure sequence/structure N/A Paper Code 2021/11
RFjoint CNN (RoseTTAFold, finetuned) N/A sequence/structure sequence/structure 25% PDB version 02/2020 + 75 % AF2 structures Paper Code 2021/11
Protein Diffusion Diffussion model - Secondary structure sequence/structure 53,414 3D structures (95% CATH 4.2 S95) Paper Code 2022/05