Helix encoder: a compound-protein interaction prediction model specifically designed for class A GPCRs
- Python = 3.7.10
- pytorch >= 1.2.0
- numpy = 1.20.2
- RDkit = 2020.09.1
- pandas = 1.3.4
- Gensim >=3.4.0
- Clone TransformerCPI
- Place each file in this repository in the TransformerCPI directory
/csvData
- csv files of protein sequences, compound SMILES, and interaction data used in the experiments
/data
- Text data as input for mol_featurizer
- data format
- A text file containing compound SMILES, protein sequences, and interactions (0 or 1) in this order, separated by spaces. Protein sequences of each transmembrane region and extracellular loop region are also separated by spaces.
O=C(OCn1ncc(Br)c(Br)c1=O)c1c(F)cccc1F GLSVAASCLVVLENLLVLAAI LVNITLSDLLTGAAYLANVLL WFLREGLLFTALAASTFSLLF VYGFIGLCWLLAALLGMLPLL FCLVIFAGVLATIMGLYGAIF VLMILLAFLVCWGPLFGLLLA MDWILALAVLNSAVNPIIYSF 1
/dataset
- Generated when cloning the transformerCPI repository
- Directory where data embedded by mol_featurizer is stored
- Generate input for Helix encoder.
python mol_featurizer_for_TM.py
- Trains Helix encoder model.
python helix_encoder_main.py
- A trained model, Helix encoder (TM + ECL2), exists in this repository (
/output/model/helixEncoder_TM_ECL2
). If you want to use this model to predict your own data, use the following.
- Place the data you want to predict in /data/.
- At mol_featurizer_for_TM, place the embedding vector in /dataset/.
- For prediction, run
python predict.py
If you use this code, please cite the following paper:
@ARTICLE{10.3389/fbinf.2023.1193025,
AUTHOR={Yamane, Haruki and Ishida, Takashi},
TITLE={Helix encoder: a compound-protein interaction prediction model specifically designed for class A GPCRs},
JOURNAL={Frontiers in Bioinformatics},
VOLUME={3},
YEAR={2023},
URL={https://www.frontiersin.org/articles/10.3389/fbinf.2023.1193025},
DOI={10.3389/fbinf.2023.1193025},
ISSN={2673-7647},
}