Skip to content

Latest commit

 

History

History
47 lines (37 loc) · 2.17 KB

README.md

File metadata and controls

47 lines (37 loc) · 2.17 KB

GNN_PersLay

Contributors

  • Jason Apostol (CS): Worked on training and implementing multi-graph representation for protein structures.
  • Anna Su (CBB): Worked on persistent homology computation and persistence layer generation.
  • Vasilije Pantelic (MB&B): Worked on the molecular geometry computations.

Project Workflow

  1. Construct Data Files:

    • Run the run_scripts.sh bash script to generate files in the data_files directory.
    • The script implements Python scripts in calculate_features to create the following files for each subdirectory:
      1. Reconstructed PDB file (id_rec.pdb)
      2. Backbone structure file (id_rec_bb.txt)
      3. Bond angle file (id_ba.txt)
      4. Bond length file (id_bl.txt)
      5. Dihedral angle file (id_da.txt)

    Directory Structure:

    data_files
    └── 1by7
        ├── 1by7_ba.txt
        ├── 1by7_bl.txt
        ├── 1by7_da.txt
        └── 1by7_rec.txt
    
  2. Construct PH Files:

    • Run the run_ph.sh script to generate the id_rec_bb.txt files from Step 1.
    • This script utilizes ph_functions.py to compute persistence diagrams, vectors, and landscapes for each backbone structure. The results are stored in the ph_files directory.
  3. Run geo_prep.py:

    • Execute geo_prep.py to create an index for your data and place it in the datasets_for_geo directory.
  4. Clone PaddleHelix

    • Clone the PaddleHelix Repository into the root of the current directory.
    • Replace GEM in PaddleHelix/apps/pretrained_compound/ChemRL with the GEM folder in this repository.
  5. Install Dependencies:

    • Ensure all dependencies listed in requirements.txt are installed.
  6. Run knottrain.sh:

    • Locate the script in the forked PaddleHelix repository located at PaddleHelix/apps/pretrained_compound/ChemRL/GEM/scripts.
    • Modify the data_path in the script to point to the JSON files generated by geo_prep.py.
    • Execute via sh scripts/knottrain.sh. This will both preprocess the data and train the models described in the paper.