You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I found some weird behaviour when running paired-end data.
To test some stuff, I created some simulated datasets where I know all the parameters like repertoire size, size of each clonotype, V gene, J gene etc.
I created the fastq's as paired-end seq files
and ran catt with the following command catt --f1 test_R1.fastq --f2 test_R2.fastq -o test_out -t 20.
The unintended behaviour I experienced can nicely be seen in one of my samples with a repertoire of one clonotype with 10.000 clones.
Catt returns 3 clones with exactly equal NNseq: AAseq,NNseq,Prob,Vregion,Jregion,Dregion,Frequency CSARGDRGLSYNEQFF,TGCAGTGCTCGGGGGGACAGGGGGCTATCCTACAATGAGCAGTTCTTC,0.00016,TRBV20-1*02,TRBJ2-1*01,TRBD1*01,6727 CSARGDRGLSYNEQFF,TGCAGTGCTCGGGGGGACAGGGGGCTATCCTACAATGAGCAGTTCTTC,0.00016,TRBV20-1*06,TRBJ2-1*01,TRBD1*01,6727 CSARGDRGLSYNEQFF,TGCAGTGCTCGGGGGGACAGGGGGCTATCCTACAATGAGCAGTTCTTC,0.00016,TRBV20-1*03,TRBJ2-1*01,TRBD1*01,6546
When combining the "different" clonotypes into one the frequencies sum up to 20.000 clones instead of 10.000.
So, it seems like catt is counting each clone twice
I thus merged the paired end files to a single file using pear and repeated the analysis.
This returned the same results as the paired-end run, only the frequencies are different. AAseq,NNseq,Prob,Vregion,Jregion,Dregion,Frequency CSARGDRGLSYNEQFF,TGCAGTGCTCGGGGGGACAGGGGGCTATCCTACAATGAGCAGTTCTTC,0.00016,TRBV20-1*06,TRBJ2-1*01,TRBD1*01,3379 CSARGDRGLSYNEQFF,TGCAGTGCTCGGGGGGACAGGGGGCTATCCTACAATGAGCAGTTCTTC,0.00016,TRBV20-1*02,TRBJ2-1*01,TRBD1*01,3348 CSARGDRGLSYNEQFF,TGCAGTGCTCGGGGGGACAGGGGGCTATCCTACAATGAGCAGTTCTTC,0.00016,TRBV20-1*03,TRBJ2-1*01,TRBD1*01,3273
For the merged "single-end" files the clones sum up to the expected 10.000.
What confuses me a little though that the counts aren't exactly half of the ones in the paired-end run
I guess that there must be some issue when counting the frequency for the paired-end samples.
Would be awesome if you could have a look!
Thanks!
The text was updated successfully, but these errors were encountered:
If I understand correctly, the definition of a TCR clone in the CATT results takes into account the differences in V and J genes in addition to cdr3aa. The problem is that the delineation of V and J genes may sometimes be inaccurate, resulting in the separation of a same TCR.
Hi,
I found some weird behaviour when running paired-end data.
To test some stuff, I created some simulated datasets where I know all the parameters like repertoire size, size of each clonotype, V gene, J gene etc.
I created the fastq's as paired-end seq files
and ran catt with the following command
catt --f1 test_R1.fastq --f2 test_R2.fastq -o test_out -t 20
.The unintended behaviour I experienced can nicely be seen in one of my samples with a repertoire of one clonotype with 10.000 clones.
Catt returns 3 clones with exactly equal NNseq:
AAseq,NNseq,Prob,Vregion,Jregion,Dregion,Frequency CSARGDRGLSYNEQFF,TGCAGTGCTCGGGGGGACAGGGGGCTATCCTACAATGAGCAGTTCTTC,0.00016,TRBV20-1*02,TRBJ2-1*01,TRBD1*01,6727 CSARGDRGLSYNEQFF,TGCAGTGCTCGGGGGGACAGGGGGCTATCCTACAATGAGCAGTTCTTC,0.00016,TRBV20-1*06,TRBJ2-1*01,TRBD1*01,6727 CSARGDRGLSYNEQFF,TGCAGTGCTCGGGGGGACAGGGGGCTATCCTACAATGAGCAGTTCTTC,0.00016,TRBV20-1*03,TRBJ2-1*01,TRBD1*01,6546
When combining the "different" clonotypes into one the frequencies sum up to 20.000 clones instead of 10.000.
So, it seems like catt is counting each clone twice
I thus merged the paired end files to a single file using pear and repeated the analysis.
This returned the same results as the paired-end run, only the frequencies are different.
AAseq,NNseq,Prob,Vregion,Jregion,Dregion,Frequency CSARGDRGLSYNEQFF,TGCAGTGCTCGGGGGGACAGGGGGCTATCCTACAATGAGCAGTTCTTC,0.00016,TRBV20-1*06,TRBJ2-1*01,TRBD1*01,3379 CSARGDRGLSYNEQFF,TGCAGTGCTCGGGGGGACAGGGGGCTATCCTACAATGAGCAGTTCTTC,0.00016,TRBV20-1*02,TRBJ2-1*01,TRBD1*01,3348 CSARGDRGLSYNEQFF,TGCAGTGCTCGGGGGGACAGGGGGCTATCCTACAATGAGCAGTTCTTC,0.00016,TRBV20-1*03,TRBJ2-1*01,TRBD1*01,3273
For the merged "single-end" files the clones sum up to the expected 10.000.
What confuses me a little though that the counts aren't exactly half of the ones in the paired-end run
I guess that there must be some issue when counting the frequency for the paired-end samples.
Would be awesome if you could have a look!
Thanks!
The text was updated successfully, but these errors were encountered: