Skip to content

Latest commit

 

History

History
167 lines (110 loc) · 5.99 KB

README.md

File metadata and controls

167 lines (110 loc) · 5.99 KB

DR-FWL-2

Dependencies

Code within this repository depends on pygmmpp package (https://github.com/zml72062/pygmmpp), which provides simple preprocessing API for graph datasets.

After downloading pygmmpp from the above URL, run make under the root directory to install the pygmmpp package.

Other requirements include:

  • python 3.9.12
  • numpy 1.21.5
  • pytorch 1.11.0
  • pytorch-scatter 2.0.9
  • pytorch-sparse 0.6.14
  • pytorch-geometric (pyg) 2.1.0
  • pytorch-lightning 2.0.1
  • wandb 0.14.0
  • torchmetrics 0.11.4
  • rdkit 2022.3.5
  • ogb 1.3.3
  • scikit-learn 1.1.1
  • scipy 1.7.3
  • h5py 3.7.0
  • tqdm 4.64.0

Counting dataset

To run the code on Substructure Counting dataset, one must first run make under directory ./software/cycle_count to compile the C code into .so binary and install the python module that generates the ground-truth for substructure counting. After that one can directly import counting_dataset.py to get the dataset. Notice that there may be issues associated with ABI compatibility, and we only tested our program on x86-64 Linux and MacOS platforms. Alternatively, one can also download from https://github.com/zml72062/cycle_count and run make under the root directory of that repository.

To run 2-DRFWL(2) GNN on Substructure Counting dataset, one can run

python train_on_count.py --seed <random seed> --config-path configs/count.json

Training settings are saved in configs/count.json by default. NOTICE that every time you modify dataset.target setting in the configure file, you should delete the processed directory under datasets/count to preprocess the dataset again for another target.

To run 3-DRFWL(2) GNN on Substructure Counting dataset, run

python train_on_count.py --use_3 --seed <random seed> --config-path configs/count.json

ZINC

/05/2023: 1.Slightly revise the preprocessing, now the preprocessing will not compute initial feature for 1-hop/2-hop edge. This part is done in the model right now. 2.Make ZINC script runnable:

python train_zinc.py

QM9

To run 2-DRFWL(2) GNN on QM9, execute

python models_qm9.py --seed <random seed> --config-path configs/qm9.json

Training settings are saved in configs/qm9.json by default.

To run SSWL/SSWL+/LFWL/SLFWL GNN on QM9, execute

python models_qm9.py --seed <random seed> --config-path configs/qm9.json --lfwl <name>

where <name> is SSWL/SSWLPlus/LFWL/SLFWL.

EXP

To run 2-DRFWL(2) GNN on EXP dataset, execute

python run_exp.py --epochs <num of epochs>

To run 3-DRFWL(2) GNN on EXP dataset, execute

python run_exp.py --epochs <num of epochs> --use_3

SR25

To run 2-DRFWL(2) GNN on SR25 dataset, execute

python run_sr.py --num-epochs <num of epochs>

To run 3-DRFWL(2) GNN on SR25 dataset, execute

python run_sr.py --num_epochs <num of epochs> --use_3

BREC

Before running experiments on BREC, check the official repository of BREC for additional requirements. Then, download the raw dataset file "brec_v3.npy" from the official repository of BREC, and replace BREC/Data/raw/brec_v3.npy with it.

To run 2-DRFWL(2) GNN and 3-DRFWL(2) GNN on BREC dataset, execute

python test_BREC_search.py

ogbg-molhiv and ogbg-molpcba

To run 2-DRFWL(2) GNN on ogbg-molhiv/ogbg-molpcba dataset, execute

python ogbmol_models.py --config-path configs/ogbmolhiv.json
python ogbmol_models.py --config-path configs/ogbmolpcba.json

To run 3-DRFWL(2) GNN on ogbg-molhiv/ogbg-molpcba dataset, execute

python ogbmol_models.py --config-path configs/ogbmolhiv.json --use_3
python ogbmol_models.py --config-path configs/ogbmolpcba.json --use_3

Cycle counting on protein datasets

We collect three protein datasets from https://github.com/phermosilla/IEConv_proteins, two of which (ProteinsDBDataset and HomologyTAPEDataset) are used for cycle counting. See https://github.com/zml72062/ProteinsDataset for our original code that processes the three datasets.

Download the two datasets from the following URLs:

  • ProteinsDBDataset

https://drive.google.com/uc?export=download&id=1KTs5cUYhG60C6WagFp4Pg8xeMgvbLfhB

Extract in protdb/raw/ProteinsDB/

  • HomologyTAPEDataset

https://drive.google.com/uc?export=download&id=1chZAkaZlEBaOcjHQ3OUOdiKZqIn36qar

Extract in homology/raw/HomologyTAPE/

We copied the IEProtLib directory from https://github.com/phermosilla/IEConv_proteins since our processing code makes use of this submodule. We also copied code from https://github.com/GraphPKU/I2GNN (the official code for Boosting the Cycle Counting Power of Graph Neural Networks with I $^2$-GNNs.) to run baseline methods (MPNN, NGNN, I2GNN and PPGN) on the two proteins datasets.

Experiments on the two protein datasets depend on the h5py package.

To run 2-DRFWL(2) GNN/MPNN/NGNN/I2GNN/PPGN/SSWL/SSWL+/LFWL(2)/SLFWL(2) on ProteinsDBDataset, execute

python train_on_proteins.py --dataset ProteinsDB --model <model> --root protdb --target <target> --batch_size 32 --h 3 --cuda 0 --epochs 1500 --test_split <test-split>

where <model> takes DRFWL2/MPNN/NGNN/I2GNN/PPGN/SSWL/SSWLPlus/LFWL/SLFWL, <target> takes 3-cycle/4-cycle/5-cycle/6-cycle, <test-split> takes 0-9 for 10-fold cross validation.

To run 2-DRFWL(2) GNN/MPNN/NGNN/I2GNN/PPGN/SSWL/SSWL+/LFWL(2)/SLFWL(2) on HomologyTAPEDataset, execute

python train_on_proteins.py --dataset HomologyTAPE --model <model> --root homology --target <target> --batch_size 32 --h 3 --cuda 0 --epochs 2000

Long range graph benchmark

To evaluate the ability of 2-DRFWL(2) GNN to capture long-range interactions, we conduct experiments on two datasets from Long Range Graph Benchmark (https://arxiv.org/abs/2206.08164): peptides-functional and peptides-structural. To run 2-DRFWL(2) GNN on the two datasets, execute

python train_on_lrgb.py --name Functional --seed <random seed>
python train_on_lrgb.py --name Structural --seed <random seed>