Skip to content

Commit

Permalink
doc: wip
Browse files Browse the repository at this point in the history
  • Loading branch information
jannisborn committed Jun 17, 2024
1 parent 499a133 commit d810028
Showing 1 changed file with 13 additions and 10 deletions.
23 changes: 13 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,11 @@ The code in this repo aims to provide a complete computational pipeline for targ
7. Create synthesis routes for best candidates


<img src="./assets//cycle.jpg" alt="logo" width="600"/>
<img src="./assets//cycle.jpg" alt="logo" width="400" align="center"/>

## 1 - Setup
### 1a - Install [GT4SD](https://github.com/GT4SD/gt4sd-core)
<img src="assets/GT4SD.png" width="100" height="100" align="right" />
<img src="assets/gt4sd.png" width="100" height="100" align="right" />
First we set up the environment.

```bash
Expand All @@ -45,7 +45,7 @@ python scripts/load_data.py \
--binary_labels
```

### 2 - Train the virtual screening model [Toxsmi](https://pubs.rsc.org/en/content/articlehtml/2023/dd/d2dd00099g)
## 2 - Train the virtual screening model [Toxsmi](https://pubs.rsc.org/en/content/articlehtml/2023/dd/d2dd00099g)

Assuming the data sets reside in the `data` folder either by running the step above or symlinking your own datasets,
you can start the training with the following command:
Expand All @@ -62,7 +62,7 @@ python scripts/train_toxsmi.py \
To change the batch size, number of epochs, etc., see `config/toxsmi_conf.json`.
For detailse, see the [Toxsmi paper](https://pubs.rsc.org/en/content/articlehtml/2023/dd/d2dd00099g).

### 3 - Generate molecules with [MoLeR](https://github.com/microsoft/molecule-generation)
## 3 - Generate molecules with [MoLeR](https://github.com/microsoft/molecule-generation)
This step uses an iterative procedure combining a substructure-driven generative model (MoLeR) with the previously trained virtual screening model (ToxSmi) to produce a set of candidate molecules with high predicted binding affinity.

NOTE: MoLeR is a *local* generative model, thus `good_docks.smi` gives you a way to condition the generative process. You can place their molecules with moieities that you would like to see in the final molecule. Or you take the best molecules from the affinity data that you have (see Step 1b). If you dont want to bias the model in any direction, we recommend to pass a large `.smi` file (>1000 molecules) with diverse chemical structures.
Expand All @@ -83,9 +83,9 @@ python scripts/moler_generate_toxsmi.py \
where `best_F1.pt` is the weights of the best toxsmi model.

To change the threshold, number of iterations, etc,. see `config/moler_conf.json`.
Read the MoLeR paper for more details: [link](https://arxiv.org/abs/2103.03864)
For details, read the [MoLeR paper](https://arxiv.org/abs/2103.03864).

### 4 - Generate more diverse molecules with [Regression Transformer](https://www.nature.com/articles/s42256-023-00639-z)
## 4 - Generate more diverse molecules with [Regression Transformer](https://www.nature.com/articles/s42256-023-00639-z)
This step refines and optimizes the generated molecules from MoLeR in order to be more drug-like.

Generate the dataset
Expand All @@ -102,9 +102,9 @@ python scripts/rt_generate.py \
```

To change the batch size, tolerance, etc., see `config/rt_conf.json`.
For details, read the [Regression Transformer paper](https://www.nature.com/articles/s42256-023-00639-z)
For details, read the [Regression Transformer paper](https://www.nature.com/articles/s42256-023-00639-z).

### 5 - Run inference on [Toxsmi](https://pubs.rsc.org/en/content/articlehtml/2023/dd/d2dd00099g)
## 5 - Run inference on [Toxsmi](https://pubs.rsc.org/en/content/articlehtml/2023/dd/d2dd00099g)
After generating a more diverse set of molecules, we screen the newly generated molecules with ToxSmi.
First we structure the input dataset by running:
```
Expand All @@ -120,15 +120,15 @@ python scripts/test_toxsmi.py \
```
this results in `models/toxsmi_model/results/dummy_data_F1_results_flat.csv` which contain the predictions.

### 6 - Computing properties with GT4SD
## 6 - Computing properties with GT4SD
To ease postprocessing and manual inspection, we compute various physicochemical properties (logP, weight, rings counts, drug-likeness) with GT4SD.
```bash
python scripts/mol_properties.py \
--smi_path models/toxsmi_model/results/dummy_data_F1_results_flat.csv \
--output_path mol_props.csv
```

### 7 - Retrosynthesis with [IBM RXN for Chemistry](https://rxn.app.accelerate.science/)
## 7 - Retrosynthesis with [IBM RXN for Chemistry](https://rxn.app.accelerate.science/)
Last, to ease wet-lab synthesis, we use [IBM RXN for Chemistry](https://rxn.app.accelerate.science/) to predict potential synthesis routes for each candidate molecule.

```bash
Expand Down Expand Up @@ -159,6 +159,9 @@ python scripts/retrosynthesis.py selected_for_retro.csv \

For further information on RXN's retrosynthesis models see [Schwaller et al. (2020)](https://pubs.rsc.org/en/content/articlehtml/2020/sc/c9sc05704h) and [Zipoli et al. (2024)](https://www.nature.com/articles/s41524-024-01290-x).


## Citations

If you're using the code here, please cite the papers that are part of this pipeline
```bib
@article{manica2023accelerating,
Expand Down

0 comments on commit d810028

Please sign in to comment.