Handling BAM files with polyA trimmed #256

omarelgarwany · 2024-11-05T13:18:00Z

Hello

I have a question/issue regarding the polyA requirement. So far, I have been using BAM files that were aligned using the ISOSEQ pipeline. As part of this pipeline, polyA tails are trimmed (isoseq refine). I would like to keep using the ISOSEQ pipeline as it does a few other single-cell-realted tasks that I think are not handled by Isoquant (e.g. extracting barcode tags, barcode correction, real cell calling). However, as polyA are trimmed, I keep getting a warning that too few reads have polyA tails (< 1%).

I am not worried about these reads having polyA tails because isoseq refine requires at least 20 As. I know that isoquant complains because these have been trimmed. I have considered adding back those tails, but I am not sure if it is safe to add back these sequences to the mapped BAM file as it will require modifying the CIGAR string and potentially other related BAM tags. Are there any other workarounds? While I agree that it the polyA requirement is a sensible check, I wish it was a little bit more flexible. I was thinking maybe if there's an option to check for a user-defined tag that specifies the length of the polyA tail (e.g. PA:i:34), then this could be more easily added back without having to modify the sequence and going through all the trouble that comes with it.

Do you have any thoughts on this?

Best wishes
Omar

The text was updated successfully, but these errors were encountered:

andrewprzh · 2024-11-12T22:24:03Z

Dear @omarelgarwany

Yes, you are not the first one to ask since many people do use both tools.
PolyA tails are needed only to ensure correct positions of discovered novel transcripts. When poly-A percentage is low, their presence is not required.
Moreover, if you are sure that every read had polyA tail initially, it's safe to use --fl_data flag. I think results should be very similar if not identical to those that would have been obtained with non-truncated reads. So I doubt modifying BAM files is necessary.

As far as I know IsoSeq pipeline can also provide information on whether a read had polyA tail or not, right? We planned to incorporate that into IsoQuant as well. But we are bit short on man power now.

Best
Andrey

andrewprzh added the question Further information is requested label Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling BAM files with polyA trimmed #256

Handling BAM files with polyA trimmed #256

omarelgarwany commented Nov 5, 2024

andrewprzh commented Nov 12, 2024

Handling BAM files with polyA trimmed #256

Handling BAM files with polyA trimmed #256

Comments

omarelgarwany commented Nov 5, 2024

andrewprzh commented Nov 12, 2024