Github repository for 2022 MATH3888 Capstone Stream 2.
stp
folder contains source code for the project. mega
and slim
are almost identical folders, with the main difference being using the slim_list.csv
and the mega_list.csv
files containing NAFLD-related proteins. data
contains the essential proteins list essential.csv
sourced from YeastMine and the yeast network yeast.txt
sourced from STRING. human_to_yeast
converts list of human proteins associated with NAFLD to their yeast homologs through the YeastMine API. stp.py
contains functions to convert between NetworkX graph instances and .stp
files.
mega_list.csv
was sourced from Mega-Human-Gene-List.xlsx
, a file curated by biochemistry students containing lists of proteins associated with NAFLD for every paper, the name of the paper and the NAFLD detection methods [3-10]. This was manually converted into a single line form for ease of use. A p-value threshold for mega_list.csv
was determined to include proteins into the list. slim_list.csv
is derived from the same papers but with a harsher p-value. 250 proteins from one paper was the acceptable maximum.
The two lists were used to test the methods on two different sizes of data and on two lists where we have a varying degree of confidence in the strength of proteins' association with NAFLD.
- Go to YeastMine
- Go to Phenotypes (in the grey middle bar); then click on "Phenotypes --> Genes"; then choose "=" "inviable".
- Go to STRING
- Go to "download"; -- enter saccharomyces cerevisiae into the dropdown menu "organism name"
- Download the file
4932.protein.links.v11.5.txt.gz
- Extract, delete the header and rename to
yeast.txt
Required packages can be installed by running pip install -r requirements
.
Downloading SCIP-Jack to solve Steiner Tree Problem
Trying to download and compile SCIP-Jack on my Windows machine was hellish. Download WSL/use a VM and use the Linux + make download options.
- Download the SCIP optimization suite: https://scipopt.org/index.php#download
- (Linux only) Navigate to directory containing download and run the following commands to build SCIP, replacing x.y.z with the relevant version number:
tar xvzf scipoptsuite-x.y.z.tgz # unpack the tarball
cd scipoptsuite-x.y.z.tgz # change into the directory
make # start compiling SCIP
make test # (recommended) check your SCIP installation
- (Linux only) Navigate to
applications/STP
folder and runmake
to build SCIP-Jack
- Load graph into filename.stp where .stp is a file extension used by SCIP-Jack to represent the STP and input. For more information check http://steinlib.zib.de/format.php.
- Create a file called 'write.set' in folder /applications/STP/settings with content
stp/logfile = "use_probname"
- Run the command
bin/stp -f filename.stp -s settings/write.set
. This will create a .stplog file that contains the solution to the STP.
stp.py was used to convert between NetworkX instances of graphs to the .stp form required for SCIP-Jack. For the purposes of reproducability, the SCIPOpt version downloaded was 8.0.2.
- D. Rehfeldt,T. Koch: Implictions, conflicts, and reductions for Steiner trees. Integer Programming and Combinatorial Optimization: 22th International Conference, IPCO 2021, Mohit Singh and David P. Williamson (Eds.), ISBN: 978-3-030-73879-2, Lecture Notes in Computer Science Vol. 12707
- D. Rehfeldt, Y. Shinano, T. Koch: SCIP-Jack: An exact high performance solver for Steiner tree problems in graphs and related problems. Modeling, Simulation and Optimization of Complex Processes, HPSC 2018, Hans Georg Bock, Willi Jäger, Ekaterina Kostina, Hoang Xuan Phu (Eds.), ISBN: 978-3-030-55240-4
- Ryaboshapkina, M., Hammar, M. Human hepatic gene expression signature of non-alcoholic fatty liver disease progression, a meta-analysis. Sci Rep 7, 12361 (2017). https://doi.org/10.1038/s41598-017-10930-w
- Hasin-Brumshtein, Y., Sakaram, S., Khatri, P. et al. A robust gene expression signature for NASH in liver expression data. Sci Rep 12, 2571 (2022). https://doi.org/10.1038/s41598-022-06512-0
- Anstee QM, Darlay R, Cockell S, Meroni M, Govaere O, Tiniakos D, Burt AD, Bedossa P, Palmer J, Liu YL, Aithal GP, Allison M, Yki-Järvinen H, Vacca M, Dufour JF, Invernizzi P, Prati D, Ekstedt M, Kechagias S, Francque S, Petta S, Bugianesi E, Clement K, Ratziu V, Schattenberg JM, Valenti L, Day CP, Cordell HJ, Daly AK; EPoS Consortium Investigators. Genome-wide association study of non-alcoholic fatty liver and steatohepatitis in a histologically characterised cohort☆. J Hepatol. 2020 Sep;73(3):505-515. doi: 10.1016/j.jhep.2020.04.003. Epub 2020 Apr 13. Erratum in: J Hepatol. 2021 Mar 4;: PMID: 32298765.
- Hoang, S.A., Oseini, A., Feaver, R.E. et al. Gene Expression Predicts Histological Severity and Reveals Distinct Molecular Profiles of Nonalcoholic Fatty Liver Disease. Sci Rep 9, 12541 (2019). https://doi.org/10.1038/s41598-019-48746-5
- Miao Z, Garske KM, Pan DZ, Koka A, Kaminska D, Männistö V, Sinsheimer JS, Pihlajamäki J, Pajukanta P. Identification of 90 NAFLD GWAS loci and establishment of NAFLD PRS and causal role of NAFLD in coronary artery disease. HGG Adv. 2021 Aug 24;3(1):100056. doi: 10.1016/j.xhgg.2021.100056. PMID: 35047847; PMCID: PMC8756520.
- Namjou, B., Lingren, T., Huang, Y. et al. GWAS and enrichment analyses of non-alcoholic fatty liver disease identify new trait-associated genes and pathways across eMERGE Network. BMC Med 17, 135 (2019). https://doi.org/10.1186/s12916-019-1364-z
- Trépo E, Valenti L. Update on NAFLD genetics: From new variants to the clinic. J Hepatol. 2020 Jun;72(6):1196-1209. doi: 10.1016/j.jhep.2020.02.020. Epub 2020 Mar 4. PMID: 32145256.
- Anstee QM, Day CP. The genetics of NAFLD. Nat Rev Gastroenterol Hepatol. 2013 Nov;10(11):645-55. doi: 10.1038/nrgastro.2013.182. Epub 2013 Sep 24. PMID: 24061205.