Skip to content

Big GIM II: Drug Response KP

Guangrong Qin edited this page Feb 28, 2024 · 27 revisions

Multiomic Provider Tool Description - BigGIM-DrugResponse KP, also called BigGIM II, provides an API that exposes knowledge graphs aggregated from public knowledge resource or derived empirical associations from omics, drug screening and functional screening datasets. It includes the concepts of diseases, drugs or chemicals, genes and proteins etc. It includes multiple KGs.
BigGIM-DrugResponse KP is KP created/supported/maintained by a Multiomic Provider. Click here to view the Multiomic Provider page.

Summary

BigGIM-DrugResponse KP is provided by the Multiomic KP team. BigGIM-DrugResponse KP includes both empirical findings from datasets with large datasets, as well as aggregated public knowledge resources or literatures. It expands the previous BigGIM from only gene_gene_interactions to more biological concepts, such as disease, drugs, and tissue types, etc.

The categories of nodes include: Genes, Drugs (SmallMolecules), Diseases; and the categories of edges (predicates) include Gene ~ gene_associated_with_condition ~ Disease, Gene (aspect qualifier: Genetic variants) ~ associated_with_sensitivity_to ~ SmallMolecule (aspect qualifier: IC50) etc.

The BigGIM II KGs includes the following components:

  • BigGIM II - Drug response (mutation-based): To understand how different gene mutations are associated with different drug responses, we extracted BigGIM II- Drug response KG (mutation based). Whole Exon Sequencing data (gene mutation data) and drug screening data from GDSC study (Iorio et al., 2016, Cell 166, 740–754. PMID:27397505) were used for knowledge graph constraction. Resource data were downloaded from https://www.cancerrxgene.org/gdsc1000/GDSC1000_WebResources/Home.html. Based on the mutation status of each gene, cell lines in each tumor type were grouped into either the wild-type group or the mutated group. The significance of the difference in the drug response IC50 values between the two groups was tested using Student T-test. The effect size was measured using the following equation:
    n1 = len(x) # x represents a vector of IC50 values for the mutated samples
    n2 = len(y) # y represents a vector of IC50 values for the wt samples
    s = np.sqrt(((n1 - 1)(np.std(x))(np.std(x)) + (n2 - 1) * (np.std(y)) * (np.std(y))) / (n1 + n2 -2))
    d = (np.mean(x) - np.mean(y)) / s
    return(d)
    The associations with P-value smaller than 0.05 were digested into the knowledge graph.

  • BigGIM II - Drug response (gene expression based): To understand how different gene expressions are associated with different drug responses, we developed BigGIM II- Drug response KG (expression-based). Spearman correlations were calculated between the gene expression (RMA gene expression values) and drug response Area Under the Curve (AUC) for cell lines in different tumor types in the GDSC project. The correlations were calculated only if the number of cell lines with both the drug response data and gene expression value for more than 6 samples. For each tumor type, the correlations between gene (symbol), drug name(need to transform into drug ids), correlation, p-value, sample size, and tumor types are included. Resource data were downloaded from https://www.cancerrxgene.org/gdsc1000/GDSC1000_WebResources/Home.html. please cite the original paper if the result is used (Iorio et al., 2016, Cell 166, 740–754. PMID:27397505).

  • BigGIM II - Gene Gene interaction (expression-based) KG: BigGIM II - Gene Gene interactions (expression-based) is an updated version for BigGIM I with updated datasets from tumor based co-expression or tissue-based gene expression. Update from the tumor-based co-expression: With the updated datasets from TCGA pancan study, we used the new version of gene expression value from the ISB-CGC PanCancer Atlas BigQuery Tables (pancancer-atlas.Filtered.EBpp_AdjustPANCAN_IlluminaHiSeq_RNASeqV2_genExp_filtered) to generate the graph (BigGIM II - Gene Gene interaction (expr-expr)). Gene co-expression correlations were computed using Pearson correlation. Gene expressions with observations in at least 25 samples were taken into consideration. Coefficient and p-value were derived from Pearson correlation analysis.

  • BigGIM II - gene_associated_with_condition_Disease (Disease-Gene) It describes which genes are highly frequently mutated in different tumor types using the gene mutation data from TCGA-pancancer. We used TCGA data to quantify the gene mutation frequency at the patient level. Genes with mutation frequency greater than 5% and has mutated samples greater than 5 samples were selected. To further narrow down the genes, we further filtered the gene list according to the identification of cancer driver genes as published in PMID:29625053. Only driver genes were exposed to the MultiomicsBigGIM_DrugResponse_KP as of the version updated in Sep 2022.

Source Code - (include links to your source code). See example below knowledge graph standardization and transformation
Parser updated in June 2023
Parser updated in Aug 2022
Parser updated in Apr 2022

External Documentation (List of urls for documentation sites).
Deployment repo:https://github.com/IlyaLab/BigGIM/tree/main
SmartAPI link:https://biothings.ncats.io/biggim_drugresponse_kp
Meta graph for individual component: https://github.com/IlyaLab/BigGIM/blob/main/Graph_tables/Graph_table.csv
TRAPI ENDPOINT: https://api.bte.ncats.io/v1/smartapi/adf20dd6ff23dfe18e8e012bde686e31/query
Example for Query: https://github.com/IlyaLab/BigGIM/blob/main/Graph_Query/Query_BigGIM_DrugResponse_KP.ipynb

Clone this wiki locally