Skip to content

SamStokman/natural_product_structures

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

natural_product_structures

The repository natural_product_structures stores data and python scripts for natural product (sub)structures.

Creating the Natural Product Structure Database

Used databases:

  • GNPS (CLASS)
  • Super Natural II (CLASS)
  • NuBBE (CLASS)
  • Human Metabolome (CLASS)
  • Yeast Metabolome (CLASS)
  • ChEBI (CLASS)
  • DrugBank (CLASS)
  • NANPDB (converted to CLASS, see python-scripts/nanpdb_CLASS_parser.py)
  • Streptomedb2 (converted to CLASS, see python-scripts/streptomedb_CLASS_parser.py)
  • NP Atlas (converted to CLASS, see python-scripts/NPAtlas_CLASS_parser.py)
  • Norine (converted to CLASS, see python-scripts/norine_CLASS_parser.py)

These CLASS databases are merged together in a file (Data/Structure_Database_File) with the script python-scripts/create_structure_db.py. The file is tab-separated and contains overlapping structures (200.061 KB).

The script python-scripts/get_db_information.py gives information about the number of recognized and unique smiles in the Structure_Database_File. Also the number of structures that occurs multiple times is generated. In total, 497.610 structures are recognized by their SMILE. The number of unique SMILES is 322.242.

All SMILES from the recognized structures are made uniform and converted into their canonical SMILE. The structures are ordered based on these canonical SMILE and stored in Data/Canonical_db_file.txt, the script python-script/create_canonical_SMILE_db.py is used to create this file.

The Canonical_db_file.txt file is used to create the tables for the sqlite database (see Data/DatabaseDesign). The tables are created by the python-scripts/sqlite_data_table_creator.py. These tables are then converted (with the scripts in the SQLite dir) into sqlite tables and added to the Natural_Product_Structure.sqlite database (Dropbox).

RDkit

The directory RDkit is used to store several python scripts that test the most important functionalities of RDkit. Also a RDkit docx file is added with some furter explanation about the installation and the functionalities.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages