Bayesian tensor factorization for predicting clinical outcomes using integrated human genetics evidence
This repo contains detailed information on data and methodology used in our work presented at ICML 2022 Computational Biology workshop.
Workshop website | Extended abstract | Poster
Schematic representation of 3-rank tensor with sparse data available on genetics evidence and clinical outcome.
We integrate three lines of human genetics evidence across rare disease, gene burden and common disease. We also use NLP-based classification of clinical outcomes to label clinical "failures". All the data used in this analysis are retrieved from the latest release of Open Targets platform v22.06, Data downloads
Data source | Positive label | Negative label |
---|---|---|
ClinGen | Definitive or Strong | Other classification |
Genomics England PanelApp | Green | Amber |
Data source | Positive label | Negative label |
---|---|---|
REGENERON | Else | |
AstraZeneca PheWAS Portal | Else | |
Genebass | Else |
Data source | Positive label | Negative label |
---|---|---|
Locus-to-gene "L2G" score | Score |
Score < |
If you find this work useful, please cite it as follows:
@misc{https://doi.org/10.48550/arxiv.2207.12538,
doi = {10.48550/ARXIV.2207.12538},
url = {https://arxiv.org/abs/2207.12538},
author = {Soylemez, Onuralp},
keywords = {Machine Learning (cs.LG), Genomics (q-bio.GN), Applications (stat.AP), FOS: Computer and information sciences, FOS: Computer and information sciences, FOS: Biological sciences, FOS: Biological sciences},
title = {Bayesian tensor factorization for predicting clinical outcomes using integrated human genetics evidence},
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution 4.0 International}
}