-
Notifications
You must be signed in to change notification settings - Fork 0
Data integration and visualization tool taking in chemical structures and bioassays for chemical risk assessment (CBRA) for publication
yenlow/chemBioViz
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
# README instructions for chemical-biological read across (CBRA) # # Also in supplemental material of publication: # Integrative Chemical–Biological Read-Across Approach for Chemical Hazard Classification. Y # Yen Low, Alexander Sedykh, Denis Fourches, Alexander Golbraikh, Maurice Whelan, Ivan Rusyn, and Alexander Tropsha. Chemical Research in Toxicology 2013 26 (8), 1199-1208. # DOI: 10.1021/tx400110 http://pubs.acs.org/doi/suppl/10.1021/tx400110f # # Yen Low ([email protected]) # 05 June 2013 version 1.0 ########################################################## ################ OBJECTIVES OF PROGRAM ################### 1. Builds 4 models for comparison a) dual-space kNN of chemical and biological neighbors b) single space kNN of chemical neighbors c) single space kNN of biological neighbors d) single space kNN of hybrid neighbors (i.e. chemical+biological spaces) 2. Generates radial plots shown in desired_output.pdf In each radial plot, - nodes represent compounds - central node represents the target compound to be predicted - nearest kbio biological neighbors are positioned left of vertical axis - nearest kchem chemical neighbors are positioned right of vertical axis - edge length is proportional to the Jaccard distance between target compound and its neighbor - nearest neighbors are positioned closest to the 12-o'clock position - colors denote the observed class of the compound (black=nontoxic,-1; red=toxic,+1) ################ DATA FILES PROVIDED ################### Data set: Rat acute toxicity (Oral LD50) a) ld50_drg_n.xa (chemical descriptors) b) ld50_atp_csp_n.xa (biological descriptors) At least R 2.14 is recommended. Tested on R 2.14, 3.0, Windows 7-8 and Ubuntu 12.10 The following R packages are required: (script will automatically install them if necessary) 1. boot 2. caret 3. class 4. e1071 5. plotrix 6. ROCR 7. vegan ###### INSTRUCTIONS ############################# STEP 1: Unzip .zip package. Check that the following files are in the same folder: 1. ld50_drg_n.xa (chemical descriptors) 2. ld50_atp_csp_n.xa (biological descriptors) 3. master_script.R 4. multispaceNNobj.R 5. multispace_functions_AD_scaled.R 6. readXAfile.R 7. sampling.R 8. validationstats_BIN.R 9. bootstrapSD_BIN.R 10. desired_output.pdf 11. README.txt STEP 2: Run master_script.R using the following command If running R in terminal MODE (e.g. in linux): At the command prompt, enter: Rscript master_script.R OR If running R GUI (e.g. in Windows): Step 2.1: Start R GUI. Step 2.2: Set working directory to where CBRA is unzipped to Within R GUI, go to File -> Change dir... and enter the file path of the CBRA folder OR enter: setwd("[file path of CBRA]") - Use foward slash "/" instead of backslash "\" Step 2.2: Open master_script.R by File -> Open Step 2.3: Run master_script.R. Within R Editor, go to Edit -> Run all PROCESS TAKES 3 MINUTES ON A QUAD-CORE COMPUTER. PLEASE BE PATIENT. (Each model consists of multiple models generated by 5-fold external cross-validation and 10-fold internal cross-validation) ################################################### Output files generated: 1. .pdf (figures showing dual-space kNN) a) singlecpd.pdf (Compound #20's nearest biological and chemical neighbors) a) 6cpds_2by3grid.pdf (6 compounds' nearest biological and chemical neighbors in 2 by 3 grid) a) 4cpds_2by2grid.pdf (4 compounds' nearest biological and chemical neighbors in 2 by 2 grid) 2. .pred files (tables containing observed and predicted values of each compound) a) chem.pred b) gene.pred c) hybrid.pred d) dual.pred 3. validationstats_xxx.txt files (prediction performance of models, e.g. specificity, sensitivity, AUC) a) validationstats_chem.txt b) validationstats_gene.txt c) validationstats_hybrid.txt d) validationstats_dual.txt 4. dualspace.RData (.RData object containing data, models, objects for radial plots) 5. shuffleID.RData (.RData object containing randomizer seed used for cross validation) ######## NOTES ######################### If dependencies cannot be loaded, manually install the following R packages using: install.packages("[Rpackage]") For example, to install the R package, boot, enter: install.packages("boot") Required R packages: 1. boot 2. caret 3. class 4. e1071 5. plotrix 6. ROCR 7. vegan Please contact Yen Low at [email protected] to report bugs.
About
Data integration and visualization tool taking in chemical structures and bioassays for chemical risk assessment (CBRA) for publication
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published