annasu1225 / GNN_PersLay Public

Notifications You must be signed in to change notification settings
Fork 0
Star 1

This project integrates GNN with persistent homology to classify molecular functions of knotted proteins.

1 star 0 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
GEM		GEM
PDBs		PDBs
PaddleHelix		PaddleHelix
calculate_features		calculate_features
catalogs		catalogs
data_files		data_files
molecular_function_labels		molecular_function_labels
ph_files		ph_files
.gitignore		.gitignore
README.md		README.md
geo_prep.py		geo_prep.py
ph_functions.py		ph_functions.py
requirements.txt		requirements.txt
run_ph.sh		run_ph.sh
run_scripts.sh		run_scripts.sh

Repository files navigation

GNN_PersLay

Contributors

Jason Apostol (CS): Worked on training and implementing multi-graph representation for protein structures.
Anna Su (CBB): Worked on persistent homology computation and persistence layer generation.
Vasilije Pantelic (MB&B): Worked on the molecular geometry computations.

Project Workflow

Construct Data Files:
- Run the run_scripts.sh bash script to generate files in the data_files directory.
- The script implements Python scripts in calculate_features to create the following files for each subdirectory:
  1. Reconstructed PDB file (id_rec.pdb)
  2. Backbone structure file (id_rec_bb.txt)
  3. Bond angle file (id_ba.txt)
  4. Bond length file (id_bl.txt)
  5. Dihedral angle file (id_da.txt)
Directory Structure:
```
data_files
└── 1by7
    ├── 1by7_ba.txt
    ├── 1by7_bl.txt
    ├── 1by7_da.txt
    └── 1by7_rec.txt
```
Construct PH Files:
- Run the run_ph.sh script to generate the id_rec_bb.txt files from Step 1.
- This script utilizes ph_functions.py to compute persistence diagrams, vectors, and landscapes for each backbone structure. The results are stored in the ph_files directory.
Run geo_prep.py:
- Execute geo_prep.py to create an index for your data and place it in the datasets_for_geo directory.
Clone PaddleHelix
- Clone the PaddleHelix Repository into the root of the current directory.
- Replace GEM in PaddleHelix/apps/pretrained_compound/ChemRL with the GEM folder in this repository.
Install Dependencies:
- Ensure all dependencies listed in requirements.txt are installed.
Run knottrain.sh:
- Locate the script in the forked PaddleHelix repository located at PaddleHelix/apps/pretrained_compound/ChemRL/GEM/scripts.
- Modify the data_path in the script to point to the JSON files generated by geo_prep.py.
- Execute via sh scripts/knottrain.sh. This will both preprocess the data and train the models described in the paper.

About

This project integrates GNN with persistent homology to classify molecular functions of knotted proteins.

Report repository

Releases

No releases published

Packages

No packages published

Contributors 2

Languages