- Jason Apostol (CS): Worked on training and implementing multi-graph representation for protein structures.
- Anna Su (CBB): Worked on persistent homology computation and persistence layer generation.
- Vasilije Pantelic (MB&B): Worked on the molecular geometry computations.
-
Construct Data Files:
- Run the
run_scripts.sh
bash script to generate files in thedata_files
directory. - The script implements Python scripts in
calculate_features
to create the following files for each subdirectory:- Reconstructed PDB file (
id_rec.pdb
) - Backbone structure file (
id_rec_bb.txt
) - Bond angle file (
id_ba.txt
) - Bond length file (
id_bl.txt
) - Dihedral angle file (
id_da.txt
)
- Reconstructed PDB file (
Directory Structure:
data_files └── 1by7 ├── 1by7_ba.txt ├── 1by7_bl.txt ├── 1by7_da.txt └── 1by7_rec.txt
- Run the
-
Construct PH Files:
- Run the
run_ph.sh
script to generate theid_rec_bb.txt
files from Step 1. - This script utilizes
ph_functions.py
to compute persistence diagrams, vectors, and landscapes for each backbone structure. The results are stored in theph_files
directory.
- Run the
-
Run
geo_prep.py
:- Execute
geo_prep.py
to create an index for your data and place it in thedatasets_for_geo
directory.
- Execute
-
Clone
PaddleHelix
- Clone the PaddleHelix Repository into the root of the current directory.
- Replace GEM in
PaddleHelix/apps/pretrained_compound/ChemRL
with theGEM
folder in this repository.
-
Install Dependencies:
- Ensure all dependencies listed in
requirements.txt
are installed.
- Ensure all dependencies listed in
-
Run
knottrain.sh
:- Locate the script in the forked PaddleHelix repository located at
PaddleHelix/apps/pretrained_compound/ChemRL/GEM/scripts
. - Modify the
data_path
in the script to point to the JSON files generated bygeo_prep.py
. - Execute via
sh scripts/knottrain.sh
. This will both preprocess the data and train the models described in the paper.
- Locate the script in the forked PaddleHelix repository located at