-
Notifications
You must be signed in to change notification settings - Fork 0
Figure6.md
Anusri Pampari edited this page Jan 10, 2025
·
6 revisions
Reproducing Figures 6 and 7.
- Download TSV files from https://www.synapse.org/Synapse:syn64126763 (We will be releasing the CAGI TSV files ones we receive permissions from the original authors)
- Run notebooks in src/figures_factory/Fig6/ to generate the figures
- Run src/figures_factory/Fig6/preprocessing/format_dsqtls.ipynb with the following inputs
- Download Supplementary Table 1 from the deltaSVM study (Lee et al. 2015), available at https://staticcontent.springer.com/esm/art%3A10.1038%2Fng.3331/MediaObjects/41588_2015_BFng3331_MOESM26_ESM.xlsx
- Obtain the actual effect sizes from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE31388 (file: GSE31388_dsQtlTable.txt.gz).
- Obtain chrombpnet predictions using https://github.com/kundajelab/variant-scorer/ for the corresponding models
- Obtain Enformer (reported) scores from https://console.cloud.google.com/storage/browser/dm-enformer/variant-scores/1000-genomes/enformer;tab=objects?prefix=&forceOnObjectsSortingFiltering=false
- TODO: script to recompute Enformer scores
- Run src/figures_factory/Fig6/preprocessing/format_kumsaka_caqtls.ipynb with the following inputs
- Download summary stats and QTLs available at https://zenodo.org/records/13848268 (file: QTD100018.all.tsv.gz) and corresponding peaks at QTD100018_peak_metadata.tsv.gz
- Obtain broad peaks called across individuals (correspondence: Kaur Alasoo), expand to ±100 bp of the summit for the top 50K peaks for each individual
- Obtain chrombpnet predictions using https://github.com/kundajelab/variant-scorer/ for the corresponding models
- Obtain Enformer (reported) scores from https://console.cloud.google.com/storage/browser/dm-enformer/variant-scores/1000-genomes/enformer;tab=objects?prefix=&forceOnObjectsSortingFiltering=false
- TODO: script to recompute Enformer scores
- Run src/figures_factory/Fig6/preprocessing/format_af_caqtls.ipynb and src/figures_factory/Fig6/preprocessing/format_af_caqtls_asb.ipynb with the following inputs.
- Obtain effect sizes from original source paper https://pmc.ncbi.nlm.nih.gov/articles/PMC10659267/ (correspondence: Soumya kundu)
- Obtain broad peaks called across individuals (correspondence: Kaur Alasoo), expand to ±100 bp of the summit for the top 50K peaks for each individual
- Obtain chrombpnet predictions using https://github.com/kundajelab/variant-scorer/ for the corresponding models
- OObtain Enformer (reported) scores from https://console.cloud.google.com/storage/browser/dm-enformer/variant-scores/1000-genomes/enformer;tab=objects?prefix=&forceOnObjectsSortingFiltering=false
- TODO: script to recompute Enformer scores
- Run src/figures_factory/Fig6/preprocessing/format_smc_caqtls.ipynb with the following inputs
- We extracted 1,984 significant caQTLs, along with their RASQUAL effect sizes, from Smooth Muscle Cells (SMCs) as reported by (Turner et al. 2022). This data was sourced from Supplementary Data 6, available at https://static-content.springer.com/esm/art%3A10.1038%2Fs41588-022-01069-0/MediaObjects/41588_2022_1069_MOESM10_ESM.xlsx.
- Filter it to peaks provided here https://www.synapse.org/Synapse:syn59479965
- Obtain chrombpnet predictions using https://github.com/kundajelab/variant-scorer/ for the corresponding models
- Run src/figures_factory/Fig6/preprocessing/format_microglia_caqtls.ipynb with the following inputs
- Dwnloaded SNPs from https://www.synapse.org/#!Synapse:syn30863713 and obtained effect sizes by merging "PeakID" and "TopSNPperPeak" columns with the microgliamacrophagemeta-caQTLsummaryresult (https://www.synapse.org/#!Synapse:syn30308248) based on "Peak" and "Variant" columns respectively. This resulted in 4,978 caQTL effects, including effects of single variants on multiple peaks. We then retrieved caQTL positions using rsid attributes as queries on dbSNP or by splitting attributes to obtain chromosome and position.
- Obtain microglia peaks provided at https://www.synapse.org/#!Synapse:syn269491355, yielding 956 caQTL effect sizes.
- Obtain chrombpnet predictions using https://github.com/kundajelab/variant-scorer/ for the corresponding models
- Run src/figures_factory/Fig6/preprocessing/format_spi1_bqtls.ipynb with the following inputs
- Get SPI1 binding QTLs from https://www.sciencedirect.com/science/article/pii/S0092867416303397?via%3Dihub (Get effect sizes from DragonNN, Ashley correspondence)
- Obtain chrombpnet predictions using https://github.com/kundajelab/variant-scorer/ for the corresponding models
- Obtain Enformer (reported) scores from https://console.cloud.google.com/storage/browser/dm-enformer/variant-scores/1000-genomes/enformer;tab=objects?prefix=&forceOnObjectsSortingFiltering=false
- TODO: script to recompute Enformer scores
- Run src/figures_factory/Fig6/preprocessing/format_blood_traits.ipynb with the following inputs
- Fine mapped blood traits (Rosa Ma from Jesse Engreitz lab)
- Backgrounds : For our control set, we collated common background variants from https://alkesgroup.broadinstitute.org/LDSCORE/baseline_v1.1_hg38_annots/. We annotated the genome into different regions using GENCODEv29, retaining only variants that were neither coding, nor in transcription start sites (TSS), nor in splice sites. To identify alleles based on rsids, we used the database available at https://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/VCF/00- common_all.vcf.gz, filtering out variants without allele information. We separated entries with multiple alternate alleles, resulting in 11,903,173 control variants.
- Obtain chrombpnet predictions using https://github.com/kundajelab/variant-scorer/ for the corresponding models