-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parsing results VCF gives different counts of TRUTH FN than summary #170
Comments
@BrianLohman I am also parsing the annotated VCF from hap.py because I found it confusing to have a position with For example, here is my STDOUT: Benchmarking Summary:
Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt FP.al METRIC.Recall METRIC.Precision METRIC.Frac_NA METRIC.F1_Score TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio
INDEL ALL 525370 519748 5622 543056 1386 0 1169 144 0.989299 0.997448 0.0 0.993357 NaN NaN 1.528472 1.744304
INDEL PASS 525370 519748 5622 543056 1386 0 1169 144 0.989299 0.997448 0.0 0.993357 NaN NaN 1.528472 1.744304
SNP ALL 3365341 3337928 27413 3341865 2187 0 1188 245 0.991854 0.999346 0.0 0.995586 2.100079 2.099354 1.581145 1.573484
SNP PASS 3365341 3337928 27413 3341865 2187 0 1188 245 0.991854 0.999346 0.0 0.995586 2.100079 2.099354 1.581145 1.573484 SUM
Rather than simply counting values in the TRUTH or QUERY column, I joined the column values with a And here is what my parsing script returns:
If I then do the following:
My output matches the hap.py STDOUT:
For the math to add up, I have to exclude all
But there are also similar redundancies with
Where my script currently breaks is that I end up with 31,620 more Let me know if you have any thoughts. My processing module is more complex than what you shared, as it's a customized intermediate step in a much larger pipeline. However, if you'd be interested in using it, I could try to make a portable helper script, once I know the math is correct. |
I am also confused at the hap.py counts. When I sum all the TP:gm entries in the TRUTH column of the annotated VCF, it is slightly more than the TRUTH.TP count reported by hap.py. |
Hello,
I'm interested in looking at the variants that hap.py calls as query false positives and truth false negatives. To do this I am parsing the vcf that hap.py writes to gather the variants that are marked as such. I get the correct number of query false positives and truth true positives but I find fewer truth false negatives than the summary shows. Is there some other way that the false negatives are counted other than 'FN' appearing in the 'BD' portion of the FORMAT field?
I am running hap.py with the docker container and even though I added -V there is only a single vcf written, not two, as in #70
hap.py command:
and summary table:
my parsing program:
And the results of the parsing program:
Where query_fp == QUERY.FP, truth.tp == TRUTH.TP, but truth_fn is less than TRUTH.FN by 1,080. Adding in FP.gt doesn't correct this difference and adding in the variants that my logic doesn't catch does not produce the correct sum either.
Thanks
Brian
The text was updated successfully, but these errors were encountered: