Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrote 0 records to *table.txt #14

Open
WeijiaSu opened this issue May 5, 2021 · 8 comments
Open

wrote 0 records to *table.txt #14

WeijiaSu opened this issue May 5, 2021 · 8 comments

Comments

@WeijiaSu
Copy link

WeijiaSu commented May 5, 2021

Hi,
I am trying to run TLDR on a human nanopore dataset. I got thousands of clusters for each chromosome, for example:
2021-05-03 14:03:21,455 loaded 5734 clusters from blood_gDNA.fastq-TLDR/chr13.pickle

but none record wrote to the result table:
finished blood_gDNA.fastq-TLDR/chr13.pickle. wrote 0 records to blood_gDNA.fastq-TLDR.table.txt

And this situation happened to all the chromosomes. Do you have any idea about this?

Thanks for your help.

Weijia

@clemgoub
Copy link

clemgoub commented Mar 9, 2023

I'm having the same issue!

I was trying on simulated insertions on the human chr 22 (~10 simulated Alu/L1/SVA insertions, 10 simulated non-TE insertions). I can recover these SV with sniffles2. TLDR finds clusters, but writes 0 records. When I apply --detail_output I see 20 fasta files there (+20 bams) but the fasta files have no sequences. Only headers.

I'm looking forward to your feedback!

Thanks,

Clément

@WeijiaSu
Copy link
Author

WeijiaSu commented Mar 9, 2023

I check a few reads from the clusters, I think they are real insertions. Not sure why they were not included in the final output.

@adamewing
Copy link
Owner

Hi, sorry to hear it's not returning anything. Are either of you you able to send a .bam file with known insertions that aren't being picked up?

@clemgoub
Copy link

Hi Adam! No problem!

I think in my case it comes from the reads source: I haven't looked in details yet (I'll compile the bams, vcf and all the details first thing tomorrow), but it seem to work with bam made with hifi reads, but won't report anything with bam made from ONT reads (each read sets is simulated from the same variants). The reads were mapped with minimap2 using either ont or hifi preset. More details tomorrow! Thanks for your help!

Clément

@clemgoub
Copy link

Dear Adam,

Here are the data with some explanation. In the example I'm sending you, I simulated a chromosome 22 (based on hg38 ref) with 14 TE insertions (10 Alu, 3 L1 and 1 SVA) 6 TE deletions (4 Alu, 1 L1, 1 SVA), 10 random insertions and 10 random deletions. These simulated variants can be found in the file sim12.vcf.

data: https://drive.google.com/file/d/1yihbwah1xj-_hC_M28a4HAyRH6rJ7JQK/view?usp=sharing

From the simulated genome sim12.simseq.genome.fa.gz, I simulated 10X ONT with pbsim3 (using ONT error model) or 10X hifi reads with pbsim3 (using PB Sequel II error model, and 10 passes per read) + ccs (to make hifi consensus). Each read set was mapped to the reference using minimap2, and them I used tldr.

For ONT:

minimap2 -ax map-ont hg38.p14.chr22.fa sim12_0001.fastq.gz | samtools sort -m4G -@4 -o sim.bam  -
tldr -b sim.bam -e ~/bin/tldr/ref/teref.ont.human.fa -r hg38.p14.chr22.fa -p 2

For hifi

minimap2 -ax map-hifi hg38.p14.chr22.fa sim12_0001.hifi.fastq.gz | samtools sort -m4G -@4 -o sim.bam  -
tldr -b sim.bam -e ~/bin/tldr/ref/teref.ont.human.fa -r hg38.p14.chr22.fa -p 2

Note that I used teref.ont.human.fa, would you recommend to use teref.human.fa instead with hifi? Anyways, tldr reported variants in this case, but not for ont.

Here is the detail of each file:

sim_tldr
├── hg38.p14.chr22.fa <-- ref genome
├── sim12.simseq.genome.fa.gz <-- simulated genome
├── sim12.vcf <-- vcf for the simulated genome (expected)
├── sim_hifi 
│   ├── sim.bam <-- hifi reads alignments
│   └── sim.table.txt <-- tldr output
└── sim_ont
    ├── sim.bam <-- ont reads alignments
    └── sim.table.txt <-- tldr output

While tldr returns 19 candidates (9 PASS) for the hifi data, there is nothing reported for the ONT alignments.
Using sniffles2 (sniffles --minsvlen 100 --reference hg38.p14.chr22.fa --input sim.bam --snf sim.snf --vcf sim.vcf) for each bam, I can recover most of the SV (TE and random SV), so I assume that my ONT bam is valid. Finally, I know that tldr doesn't report DELs, so I understand they would not show up anyways.

Thanks for your help!

Clément

@CWYuan08
Copy link

CWYuan08 commented Mar 3, 2024

Dear @clemgoub,

thank you for your post!
I'm having the same issue, I am wondering if you have found out how to fix it?

best regards,
CW

@clemgoub
Copy link

clemgoub commented Mar 4, 2024

Hi @CWYuan08,

Unfortunately no. My guess is that TLDR didn't like my simulated reads. I ended up only testing TLDR on real data.

Cheers,

Clément

@jiyonghao-cpu
Copy link

deer,
i have the same question , i do not know how to deal with it .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants