Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script to make gnomad benign truth set (filtered on ExAC) and filter out variants on AA change #4

Closed
jimhavrilla opened this issue Jul 13, 2017 · 5 comments

Comments

@jimhavrilla
Copy link
Contributor

https://github.com/quinlan-lab/regionanalysis/blob/master/parvarfilter.py

Is the filter script. Frequency and genes can also be filtered with https://github.com/quinlan-lab/regionanalysis/blob/master/secondfilter.py

Can use it like:

python parvarfilter.py -x $DATA/clinvar-gnomad.txt -n clinvar -c -s patho -e gnomad -d genescreens/ad_genecards_clean.txt -f

Creates a file called $DATA/clinvar-patho-gnomad.txt ( you have to add back a vcf header, but that's an easy fix ).

python parvarfilter.py -x $DATA/gnomad-exac.txt -n gnomad -s benign -e exac -d genescreens/ad_genecards_clean.txt -f

Creates a set of gnomad benigns called gnomad-benign-exac.txt (gnomad, benign set, filtered on exac). Filters on AA change/allele matching. Also, optionally on AD gene set.

as in:
https://github.com/quinlan-lab/regionanalysis/blob/master/pathocompare.sh

@brentp
Copy link
Member

brentp commented Jul 13, 2017

that would be great to add more truth sets.
if you want to do this, first read this: https://github.com/quinlan-lab/pathoscore#truth-sets

and have a look at make.sh and make.py for clinvar and then open a PR.

@jimhavrilla
Copy link
Contributor Author

jimhavrilla commented Jul 13, 2017 via email

@brentp
Copy link
Member

brentp commented Jul 13, 2017

create a make.sh that can be run as bash make.sh and the result is the .vcf.gz(s) you'll be adding as a truth set. if you have pathogenics and benigns, there will be 2 files. either "pathogenic" or "benign" should be in the name of the resulting .vcf.gz

@jimhavrilla
Copy link
Contributor Author

jimhavrilla commented Jul 13, 2017 via email

@jimhavrilla
Copy link
Contributor Author

I think the benign truth set on gnomAD is more or less done at this point with the "benchmark" sets...filtering on AA change sounds like something I could maybe add in the future, but it may have to be through vcfanno for speed's sake. Perhaps we should close this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants