Ypredict is a python based software package that predicts y chromosome haplogroup. Here, I use calculate rank method to automatically find the most likely y haplogroup. For each y haplogroup, I give two mark (T or F) according their snp calling state. For Example, if the haplogroup O2a1a1a2a1 in isogg (https://isogg.org/tree/) haplogroup tree has six snps, five snps was observed. If the ratio is 5/6 >= 0.2, I give the T mark. Else, F mark. For each haplogroup, I calculate the number of T mark as n_T, nonexist as n_F, along the routine from the 'Y' haplogroup to this haplogroup(rank = (n_T**2)/(n_T + n_F)). If the rank same, the max number of n_T of the haplogroup will be the most likely haplogroup. If rank and n_T are both the same, then ramdomly select the one of matched haplogroup.
- The current version is 0.0.1
- biopython(https://biopython.org/wiki/Download)
- GATK(https://software.broadinstitute.org/gatk/download/)
Download y haplogroup tree from isogg. Then, filter snp by snpfilter.py. In this step, hotspot and backmutate snp will be removed. Finally, two files map.json and ref_vcf.gz will be generated. Importantly, Y chromosome fasta file will use in this step(test/Y.fasta), it can be downloaded from NCBI or get from hg38 reference genome using bedtools.
snpfilter.py -snp snp14.3.csv -y Y.fasta
In this step, we will use the file ref_vcf.gz generated by the step1 to make snp calling using gatk3.8 UnifiedGenotyper module. Critically, we use hg38 reference genome in this step.
java -Xmx32g -jar GenomeAnalysisTK.jar -T UnifiedGenotyper -R hg38.fa -I *.bam -o y.vcf.gz --intervals chrY -ploidy 1 --output_mode EMIT_ALL_SITES --genotyping_mode GENOTYPE_GIVEN_ALLELES --alleles ref_vcf.gz
Y chromosome haplogroup can be predicted by ypredict.py. In the this step, the script will automatically output the most likely haplogroup. The final result can be seen in ypredict.txt. More detail output writed in ystatistics.csv.
ypredict.py -vcf y.vcf.gz -s hfspecial.xlsx -m map.json
If you need to update y haplogroup tree file downloaded from isogg, you can redo step1 and get an updated map.json.
git clone https://github.com/N-damo/ypredict-master.git
python setup.py install
or pip install ypredict