Skip to content

Commit

Permalink
Added example for problems with metabolite IDs in measured data
Browse files Browse the repository at this point in the history
  • Loading branch information
ChristinaSchmidt1 committed Nov 21, 2024
1 parent e6ad4df commit 501c537
Showing 1 changed file with 70 additions and 2 deletions.
72 changes: 70 additions & 2 deletions vignettes/EnrichmentAnalysis.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,6 @@ Given that we have the gene-metabolite-sets, we can now also run enrichment anal
Yet, it is important to keep in mind that generally we detect less metabolites than genes and hence this may bias the results obtained from combined enrichment analysis.

## MetaLinksDB Metabolite-receptor sets

The MetaLinks database is a manually curated database of metabolite-receptor and metabolite-transporter sets that can be used to study the connection of metabolites and receptors or transporters [@Farr_Dimitrov2024].\

```{r}
Expand All @@ -222,6 +221,7 @@ MetaLinksDB_Res[["MetalinksDB_Type"]][c(1,50, 90, 101),]%>%
\
\


::: {.progress .progress-striped .active}
::: {.progress-bar .progress-bar-success style="width: 100%"}
:::
Expand Down Expand Up @@ -409,12 +409,70 @@ Note that ordinarily, we should expect that the Trans2Orig tables do not have an

<br>\


# 4. Metabolite IDs in measured data
The difficulty with assigning metabolite IDs to measured data is the uncertainty in the detection of metabolites. Indeed, differentiation of structural isomers (both constitutional isomers and stereoisomers) as for example the distinction between enantiomers. This leads to loss of information and hence uncertainty is assigning metabolite IDs.\
One example is the metabolite Alanine, which can occur in its L- or D- form. If in an experiment those enantiomers have not been distinguished, the correct way would be to either assign two metabolite IDs (L- and D-Alanine) or a more general Alanine ID without chiral information. Yet, in reality this is not as trivial:\
```{r, echo=FALSE}
#Create DF for Alanine:
Alanine <- data.frame(
TrivialName = c("D-Alanine", "L-Alanine", "Alanine", "Alanine zwitterion"),
HMDB= c("HMDB0001310", "HMDB0000161", NA, NA),
ChEBI = c("15570", "16977", "16449", "66916" ),
stringsAsFactors = FALSE)
# Print table:
Alanine%>%
kableExtra::kbl(caption = "Available Alanine IDs in HMDB and ChEBI.") %>%
kableExtra::kable_classic(full_width = F, html_font = "Cambria", font_size = 12)
```
\
For instance, if we want to assign a HMDB ID, we have to assign both "HMDB0001310", "HMDB0000161" to the metabolite Alanine, for ChEBI we could assign only one, "16449", but this may lead to other problems as the ChEBI ID is not specific and may not be part of certain metabolic pathways. The reason for this is that substrate chirality is critical to enzymatic processes and stereoselectivity of enzymes to be homochiral with predominance of one particular enantiomer (e.g. D-sugars, L-amino acids, etc.).\
To showcase the severity of this problem, we can look at the occurrence of those metabolites in metabolic pathways across different databases. To do so we serached for those metabolite IDs in the RaMP database [@Braisted2023] and extracted the pathways they are part of:\
```{r, echo=FALSE}
# devtools::install_github("ncats/RAMP-DB")
#rampDB <- RaMP::RaMP(version = "2.5.4")
Pathways <- RaMP::getPathwayFromAnalyte(db = rampDB, c("hmdb:HMDB0001310", "hmdb:HMDB0000161", "chebi:16449", "chebi:66916", "chebi:16977", "chebi:15570"))%>%
tidyr::unite(Pathway, c("pathwaySource", "pathwayName"), sep = ": ")
Count <- Pathways %>%
dplyr::group_by(inputId) %>%
dplyr::summarise(PathwayCount = dplyr::n_distinct(Pathway), .groups = 'drop')%>%
tidyr::separate(inputId, into = c("Database", "ID"), sep = ":", remove = TRUE)
Alanine <- merge(x = Alanine%>%
tidyr::pivot_longer(cols = c("HMDB", "ChEBI"),
names_to = "Database",
values_to = "ID")%>%
dplyr::filter(!is.na(ID)) ,
y= Count[,2:3],
by = "ID",
all.x=TRUE)%>%
dplyr::mutate(PathwayCount = replace(PathwayCount, is.na(PathwayCount), 0))%>%
dplyr::arrange(desc( TrivialName)) %>%
dplyr::select(TrivialName, ID, Database, PathwayCount)
# Print table:
Alanine%>%
kableExtra::kbl(caption = "Alanine IDs in HMDB and ChEBI mapped to pathways from wiki, KEGG and Reactome using RamP.") %>%
kableExtra::kable_classic(full_width = F, html_font = "Cambria", font_size = 12)
#BarGraph?
#* https://www.sciencedirect.com/science/article/pii/S0731708521005410
#* This problem is even excarbated in lipidomics, where different levels of residues....
```
\
This showcases if we choose the ChEBI ID for Alanine (ChEBI ID 16449), if experimentally the distinction was not possible, we will not map to any pathway even though the metabolite is part of many pathways.\


::: {.progress .progress-striped .active}
::: {.progress-bar .progress-bar-success style="width: 100%"}
:::
:::

# 4. Run enrichment analysis
# 5. Run enrichment analysis

::: {.progress .progress-striped .active}
::: {.progress-bar .progress-bar-success style="width: 100%"}
Expand Down Expand Up @@ -452,10 +510,20 @@ The full scope of different methods is beyond the scope of MetaProViz, but are a
#- Now we can at least do understand the classes present \* LipidMaps IDs, can we link other ID types?
#Viz: - Showcase specific viz options - Add network plot
#*Discuss chemical class enrichment, clustering before/after enrichment analysis. Also showcase metabolite-protein enrichment. Discuss how adding the additional columns (showcase helper function) will aid interpretability!
#*Also discuss that chemical enrichment can be more informative and what are the disadvantages of pathways (specifically in metabolomics/lipidomics) ---> Makes it even more important to make the right visualisation choices as the feature space is not that big (compared to other omics)
```
\
\


::: {.progress .progress-striped .active}
::: {.progress-bar .progress-bar-success style="width: 100%"}
:::
Expand Down

0 comments on commit 501c537

Please sign in to comment.