Skip to content

Commit

Permalink
Modification of first tutorial to include references for the new CoV-…
Browse files Browse the repository at this point in the history
…2 datasets
  • Loading branch information
Simon Axelrod committed Dec 2, 2020
1 parent df4d6ca commit 0be405b
Show file tree
Hide file tree
Showing 2 changed files with 34 additions and 19 deletions.
51 changes: 33 additions & 18 deletions tutorials/01_loading_data.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"This tutorial shows how to load and understand the GEOM data.\n"
"This tutorial shows how to load and understand the GEOM data. **Caution: We will only be updating the RDKit files as new data gets added to GEOM, and we will not be updating the messagepack files. Make sure you are using the RDKit files if you want the most up-to-date data.**\n"
]
},
{
Expand All @@ -38,7 +38,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -54,11 +54,16 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"drugs_file = \"drugs_crude.msgpack\"\n",
"import os\n",
"\n",
"# change to where your data is located\n",
"direc = \"/home/saxelrod/rgb_nfs/GEOM_NON_TAR\"\n",
"\n",
"drugs_file = os.path.join(direc, \"drugs_crude.msgpack\")\n",
"unpacker = msgpack.Unpacker(open(drugs_file, \"rb\"))\n"
]
},
Expand All @@ -73,7 +78,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -89,7 +94,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -107,7 +112,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 6,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -144,7 +149,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 7,
"metadata": {},
"outputs": [
{
Expand All @@ -171,7 +176,7 @@
" 'datasets': ['plpro', 'aid1706']}"
]
},
"execution_count": 6,
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
Expand Down Expand Up @@ -204,7 +209,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 8,
"metadata": {},
"outputs": [
{
Expand All @@ -213,7 +218,7 @@
"84"
]
},
"execution_count": 7,
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -231,7 +236,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 9,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -289,7 +294,7 @@
" 'conformerweights': [0.22575, 0.22548]}"
]
},
"execution_count": 8,
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
Expand Down Expand Up @@ -340,7 +345,7 @@
"metadata": {},
"outputs": [],
"source": [
"qm9_features_file = \"qm9_featurized.msgpack\"\n",
"qm9_features_file = os.path.join(direc, \"qm9_featurized.msgpack\")\n",
"qm9_unpacker = msgpack.Unpacker(open(qm9_features_file, \"rb\"))\n",
"qm9_feat_1k = next(iter(qm9_unpacker))"
]
Expand All @@ -364,7 +369,7 @@
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": 14,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -787,7 +792,7 @@
" 'canon_smiles': 'CN1C[C@H]2NC[C@]21C#N'}"
]
},
"execution_count": 19,
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
Expand Down Expand Up @@ -842,6 +847,16 @@
"\n",
"Here we provide the references for the different possible values of the `dataset` key in the dictionaries:\n",
"\n",
"- `ellinger`:\n",
" - Bernhard Ellinger, Denisa Bojkova, Andrea Zaliani, Jindrich Cinatl, Carsten Claussen, Sandra Westhaus, Jeanette Reinshagen, Maria Kuzikov, Markus Wolf, Gerd Geisslinger, et al. Identification of inhibitors of SARS-CoV-2 in-vitro cellular toxicity in human (Caco-2) cells using a large scale drug repurposing collection. 2020.\n",
"\n",
"- `amu_sars_cov_2`: \n",
" - Franck Touret,Magali Gilles, Karine Barral, Antoine Nougairède, Etienne Decroly, Xavierde Lamballerie, and Bruno Coutard. In vitro screening of a FDA approved chemical library reveals potential inhibitors of SARS-CoV-2 replication. BioRxiv, 2020.\n",
" \n",
"- `mpro_xchem`: \n",
" - Main protease structure and XChem fragment screen. https://www.diamond.ac.uk/covid-19/for-scientists/Main-protease-structure-and-XChem.html.\n",
" \n",
" \n",
"- `aid1706`: \n",
" - Valerie Tokars and Andrew Mesecar. QFRET-based primary biochemical high throughput screening assayto identify inhibitors of the SARS coronavirus 3C-like Protease (3CLPro). https://pubchem.ncbi.nlm.nih.gov/bioassay/1706\n",
" - https://github.com/yangkevin2/coronavirus_data/blob/master/data/AID1706_binarized_sars.csv. Accessed: 2020-03-28\n",
Expand Down Expand Up @@ -871,9 +886,9 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python [conda env:geom]",
"display_name": "Python 3",
"language": "python",
"name": "geom"
"name": "python3"
},
"language_info": {
"codemirror_mode": {
Expand Down
2 changes: 1 addition & 1 deletion tutorials/02_loading_rdkit_mols.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2366,7 +2366,7 @@
"source": [
"In this case we only had to remove the `Br-` anion and the take the rest of the SMILES string.\n",
"\n",
"If you want to compare the SMILES in our data to the SMILES in another dataset, you can use `uncleaned_smiles` for our data and `Chem.MolToSmiles(Chem.MolFromSmiles(<their_smiles>))` for theirs. Our SMILES are already in canonical form, and application of `Chem.MolToSmiles(Chem.MolFromSmiles(` converts `<their_smiles>` to canonical form, too."
"If you want to compare the SMILES in our data to the SMILES in another dataset, you can use `uncleaned_smiles` for our data and `Chem.MolToSmiles(Chem.MolFromSmiles(<their_smiles>))` for theirs. Our SMILES are already in canonical form, and application of `Chem.MolToSmiles(Chem.MolFromSmiles())` converts `<their_smiles>` to canonical form, too."
]
},
{
Expand Down

0 comments on commit 0be405b

Please sign in to comment.