Added example for problems with metabolite IDs in measured data

saezlab · Nov 21, 2024 · 501c537 · 501c537
1 parent e6ad4df
commit 501c537
Showing 1 changed file with 70 additions and 2 deletions.
diff --git a/vignettes/EnrichmentAnalysis.Rmd b/vignettes/EnrichmentAnalysis.Rmd
@@ -201,7 +201,6 @@ Given that we have the gene-metabolite-sets, we can now also run enrichment anal
 Yet, it is important to keep in mind that generally we detect less metabolites than genes and hence this may bias the results obtained from combined enrichment analysis.
 
 ## MetaLinksDB Metabolite-receptor sets
-
 The MetaLinks database is a manually curated database of metabolite-receptor and metabolite-transporter sets that can be used to study the connection of metabolites and receptors or transporters [@Farr_Dimitrov2024].\
 
 ```{r}
@@ -222,6 +221,7 @@ MetaLinksDB_Res[["MetalinksDB_Type"]][c(1,50, 90, 101),]%>%
 \
 \
 
+
 ::: {.progress .progress-striped .active}
 ::: {.progress-bar .progress-bar-success style="width: 100%"}
 :::
@@ -409,12 +409,70 @@ Note that ordinarily, we should expect that the Trans2Orig tables do not have an
 
 <br>\
 
+
+# 4. Metabolite IDs in measured data
+The difficulty with assigning metabolite IDs to measured data is the uncertainty in the detection of metabolites. Indeed, differentiation of structural isomers (both constitutional isomers and stereoisomers) as for example the distinction between enantiomers. This leads to loss of information and hence uncertainty is assigning metabolite IDs.\
+One example is the metabolite Alanine, which can occur in its L- or D- form. If in an experiment those enantiomers have not been distinguished, the correct way would be to either assign two metabolite IDs (L- and D-Alanine) or a more general Alanine ID without chiral information. Yet, in reality this is not as trivial:\
+```{r, echo=FALSE}
+#Create DF for Alanine:
+Alanine <- data.frame(
+  TrivialName = c("D-Alanine", "L-Alanine", "Alanine", "Alanine zwitterion"),
+  HMDB= c("HMDB0001310",  "HMDB0000161", NA, NA),
+  ChEBI = c("15570", "16977", "16449", "66916" ),
+  stringsAsFactors = FALSE)
+
+# Print table:
+Alanine%>%
+  kableExtra::kbl(caption = "Available Alanine IDs in HMDB and ChEBI.") %>%
+  kableExtra::kable_classic(full_width = F, html_font = "Cambria", font_size = 12)
+```
+\
+For instance, if we want to assign a HMDB ID, we have to assign both "HMDB0001310",  "HMDB0000161" to the metabolite Alanine, for ChEBI we could assign only one, "16449", but this may lead to other problems as the ChEBI ID is not specific and may not be part of certain metabolic pathways. The reason for this is that substrate chirality is critical to enzymatic processes and stereoselectivity of enzymes to be homochiral with predominance of one particular enantiomer (e.g. D-sugars, L-amino acids, etc.).\
+To showcase the severity of this problem, we can look at the occurrence of those metabolites in metabolic pathways across different databases. To do so we serached for those metabolite IDs in the RaMP database [@Braisted2023] and extracted the pathways they are part of:\
+```{r, echo=FALSE}
+# devtools::install_github("ncats/RAMP-DB")
+#rampDB <- RaMP::RaMP(version = "2.5.4")
+
+Pathways <- RaMP::getPathwayFromAnalyte(db = rampDB, c("hmdb:HMDB0001310", "hmdb:HMDB0000161", "chebi:16449", "chebi:66916", "chebi:16977", "chebi:15570"))%>%
+ tidyr::unite(Pathway, c("pathwaySource", "pathwayName"), sep = ": ")
+
+Count <- Pathways %>%
+  dplyr::group_by(inputId) %>%
+  dplyr::summarise(PathwayCount = dplyr::n_distinct(Pathway), .groups = 'drop')%>%
+  tidyr::separate(inputId, into = c("Database", "ID"), sep = ":", remove = TRUE)
+
+Alanine <- merge(x = Alanine%>%
+                      tidyr::pivot_longer(cols = c("HMDB", "ChEBI"), 
+                                          names_to = "Database",
+                                          values_to = "ID")%>%
+                      dplyr::filter(!is.na(ID))  , 
+                 y= Count[,2:3],
+                 by = "ID",
+                 all.x=TRUE)%>%
+  dplyr::mutate(PathwayCount = replace(PathwayCount, is.na(PathwayCount), 0))%>%
+  dplyr::arrange(desc( TrivialName)) %>%
+  dplyr::select(TrivialName, ID, Database, PathwayCount)
+
+# Print table:
+Alanine%>%
+  kableExtra::kbl(caption = "Alanine IDs in HMDB and ChEBI mapped to pathways from wiki, KEGG and Reactome using RamP.") %>%
+  kableExtra::kable_classic(full_width = F, html_font = "Cambria", font_size = 12)
+
+#BarGraph?
+
+#* https://www.sciencedirect.com/science/article/pii/S0731708521005410
+#* This problem is even excarbated in lipidomics, where different levels of residues....
+```
+\
+This showcases if we choose the ChEBI ID for Alanine (ChEBI ID 16449), if experimentally the distinction was not possible, we will not map to any pathway even though the metabolite is part of many pathways.\
+
+
 ::: {.progress .progress-striped .active}
 ::: {.progress-bar .progress-bar-success style="width: 100%"}
 :::
 :::
 
-# 4. Run enrichment analysis
+# 5. Run enrichment analysis
 
 ::: {.progress .progress-striped .active}
 ::: {.progress-bar .progress-bar-success style="width: 100%"}
@@ -452,10 +510,20 @@ The full scope of different methods is beyond the scope of MetaProViz, but are a
 #- Now we can at least do understand the classes present \* LipidMaps IDs, can we link other ID types?
 
 #Viz: - Showcase specific viz options - Add network plot
+
+
+
+
+
+#*Discuss chemical class enrichment, clustering before/after enrichment analysis. Also showcase metabolite-protein enrichment. Discuss how adding the additional columns (showcase helper function) will aid interpretability!
+#*Also discuss that chemical enrichment can be more informative and what are the disadvantages of pathways (specifically in metabolomics/lipidomics) ---> Makes it even more important to make the right visualisation choices as the feature space is not that big (compared to other omics)
+
+
 ```
 \
 \
 
+
 ::: {.progress .progress-striped .active}
 ::: {.progress-bar .progress-bar-success style="width: 100%"}
 :::