Skip to content

This project integrates GNN with persistent homology to classify molecular functions of knotted proteins.

Notifications You must be signed in to change notification settings

annasu1225/GNN_PersLay

Repository files navigation

GNN_PersLay

Contributors

  • Jason Apostol (CS): Worked on training and implementing multi-graph representation for protein structures.
  • Anna Su (CBB): Worked on persistent homology computation and persistence layer generation.
  • Vasilije Pantelic (MB&B): Worked on the molecular geometry computations.

Project Workflow

  1. Construct Data Files:

    • Run the run_scripts.sh bash script to generate files in the data_files directory.
    • The script implements Python scripts in calculate_features to create the following files for each subdirectory:
      1. Reconstructed PDB file (id_rec.pdb)
      2. Backbone structure file (id_rec_bb.txt)
      3. Bond angle file (id_ba.txt)
      4. Bond length file (id_bl.txt)
      5. Dihedral angle file (id_da.txt)

    Directory Structure:

    data_files
    └── 1by7
        ├── 1by7_ba.txt
        ├── 1by7_bl.txt
        ├── 1by7_da.txt
        └── 1by7_rec.txt
    
  2. Construct PH Files:

    • Run the run_ph.sh script to generate the id_rec_bb.txt files from Step 1.
    • This script utilizes ph_functions.py to compute persistence diagrams, vectors, and landscapes for each backbone structure. The results are stored in the ph_files directory.
  3. Run geo_prep.py:

    • Execute geo_prep.py to create an index for your data and place it in the datasets_for_geo directory.
  4. Clone PaddleHelix

    • Clone the PaddleHelix Repository into the root of the current directory.
    • Replace GEM in PaddleHelix/apps/pretrained_compound/ChemRL with the GEM folder in this repository.
  5. Install Dependencies:

    • Ensure all dependencies listed in requirements.txt are installed.
  6. Run knottrain.sh:

    • Locate the script in the forked PaddleHelix repository located at PaddleHelix/apps/pretrained_compound/ChemRL/GEM/scripts.
    • Modify the data_path in the script to point to the JSON files generated by geo_prep.py.
    • Execute via sh scripts/knottrain.sh. This will both preprocess the data and train the models described in the paper.

About

This project integrates GNN with persistent homology to classify molecular functions of knotted proteins.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published