Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no PASS calls in human test data #32

Open
yuliamostovoy opened this issue Aug 15, 2023 · 3 comments
Open

no PASS calls in human test data #32

yuliamostovoy opened this issue Aug 15, 2023 · 3 comments

Comments

@yuliamostovoy
Copy link

Hi,

I'm having some trouble testing TLDR on human HIFI data. I'm using the provided teref.human.fa file and testing the program on a 6MB region that I know from other sources has some solid ALU and SVA insertions in this sample (I can manually verify them in IGV using the bam file that I'm inputting to TLDR). The sample was sequenced to 30x. The BAM was aligned with pbmm2, which in theory (?) should be equivalent to minimap2 and has soft-clipping. My run looks like this:
tldr -b test_chr1_24000000-30000000.bam -e ~/local/tldr/ref/teref.human.fa -r ~/work/ref/hg38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa

and the output is attached.
test_chr1_24000000-30000000.table.txt
Thanks!

@yuliamostovoy
Copy link
Author

After writing that, I tried running bamToFastq on my BAM file and re-aligned with minimap2, and now TLDR is working as expected (it found most of the ALUs, although not the SVA - in any case, the output seems reasonable now). There must be something about the BAM files produced by pbmm2 that TLDR isn't expecting? We have a bunch of samples that were mapped with pbmm2, so I wonder if this is something that's fixable without realigning all those reads?

@adamewing
Copy link
Owner

adamewing commented Aug 16, 2023

Ah, that's intereting. I haven't tried .bam files from pbmm2 yet - do you know of a public dataset that uses this? If not are you able to share a chunk of a .bam file around one of the aforementioned Alu insertion? (i.e. if it's public cell line data or something that can be shared and not patient data)

Regarding the SVA that's still being missed - if you have a look in IGV, are there reads that completely span the insertion?

@yuliamostovoy
Copy link
Author

Yes, no problem, I'm using a 1000 Genomes sample for testing. I'm attaching the pbmm2 BAM file from a region +/- 20kb around an Alu (which gets detected from the same reads aligned with minimap2). Thanks for your help!

The SVA is fully spanned by multiple reads, and TLDR detects an 'NA' insertion there of 4bp but not the full SVA of ~2800bp. I'm including that region realigned with minimap2 in case you want to take a look.
bamfiles.tar.gz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants