This repository contains scripts and data files used in the generation of figures and spectral library for the study using MS/MS fragmentation-based MassQL filters to differentiate bile acid isomers.
Figure 1. Development of MS/MS Fragmentation-Based MassQL Filters. a) Structure of bile acids highlighting mono-, di-, and tri-hydroxylated steroid cores, with experimentally observed potential hydroxylation sites on the steroid core indicated by red stars. b) MS/MS fragmentation spectra of the regioisomers, taurochenodeoxycholic acid and taurodeoxycholic acid, illustrating a low-intensity mass region containing ions unique to each isomer. c) Enlarged view of the MS/MS fragmentation spectra for taurochenodeoxycholic and taurodeoxycholic acids, emphasizing the ion pair used to calculate relative intensity ratios for differentiating these isomers. d) Sequential MS/MS fragmentation-based filtering tree designed to classify regio- and stereoisomers of dihydroxylated bile acids. Structures at each filtering step are shown, with terminal bins color-coded for clarity. e) Retention time and MS/MS analysis confirmed the MassQL tree-predicted core structure of the bile acid conjugate, deoxycholyl-putrescine by comparing with a synthetic standard.
Figure 2. Discovery and Validation of Deoxycholyl-2-Aminophenol in Biological Samples. a) Distribution of bile acids (n=776; bile acids with more than 2-fold change from the total 929) between small intestinal fluid and fecal samples. b) Assignment of the bile acid steroid core by MassQL queries and putative identification of the modification using high-resolution MS1 data. c) Validation of deoxycholyl-2-aminophenol through retention time matching in biological samples using the synthetic standard. d) Peak area abundances of deoxycholyl-2-aminophenol before and after anti-inflammatory ITIS diet intervention. e) Peak area abundances of deoxycholyl-2-aminophenol before and after MIND diet intervention. f) Percentage of participants with non-zero peak area abundances of deoxycholyl-2-aminophenol. A non-parametric Wilcoxon signed-rank test was performed for the boxplots in Figures 2d and 2e. Horizontal lines indicate the median value, the first (lower) and third (upper) quartiles are represented by the box edges, and vertical lines (whiskers) indicate the error range, which is 1.5 times the interquartile range.
The script in the folder BA_isomer_MassQL_filters can be used to identify MS/MS scans from a MGF of a LC-MS/MS dataset.
How to use the Bile acid isomer MassQL queries with LC-MS/MS dataset: To use the isomer MassQL queries on any untargeted LC-MS/MS dataset we have provided a custom Python script (see code availability section). Any MGF file containing the list of MS/MS spectra from a LC-MS/MS dataset is provided as an input to this Python script. The output is a .tsv file which provides the list of MS/MS scan numbers captured by each of the MassQL queries. Next we need to apply the sequential binning of the MS/MS scans by traversing each branch of the fragmentation trees to obtain the final scans which fall into one of the terminal bins. [NOTE: The sequential binning is critical to reduce false hits. Please do not skip the this step]. This step can be done in R or Python script by obtaining the union of the scans from all the isomer groups in each branch of the filtering tree. As an example, for getting MS/MS scans of 3,12α-(OH)2 we need to find the common MS/MS scans obtained in “Dihydroxy”, “3,12α-(OH)2; 7,12α-(OH)2” and “3,12α-(OH)2” steps. We have used this approach in the analysis of the four LC-MS/MS datasets in this study (Figures 2a, 2d-f) and the script called "common_scans" can be found in the folder "BA_isomer_MassQL_filters" deposited to GitHub. Alternatively, the MassQL bile acid isomer library called "GNPS-MASSQL-BILE-ACID-ISOMER" available on GNPS2 as a propagated library can be used while running molecular networking jobs. The documentation to use the library is available at https://wang-bioinformatics-lab.github.io/GNPS2_Documentation/libraries/. However, caution will be needed to manually investigate the spectral matches to ensure the bile acid diagnostic peaks are detected in the MS/MS spectra and annotations will need to be validated by retention time matching with synthetic standards.