Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"NoneType" TypeError when clustering reads #27

Open
malonematt opened this issue May 24, 2022 · 11 comments
Open

"NoneType" TypeError when clustering reads #27

malonematt opened this issue May 24, 2022 · 11 comments

Comments

@malonematt
Copy link

Hi Adam,

Thanks for all of the help you've given me using your software.

I ran into the following error:

2022-05-24 14:49:35,184 tldr started with command: /home1/malonema/.local/bin/tldr -b bams/OF1_sorted_mappings.bam -r resources/Masked_Genome_061021.fa -e none -p 20 -o results/OF1.tldr --detail_output --extend_consensus 2000
2022-05-24 14:49:35,184 output basename: results/OF1.tldr
2022-05-24 14:49:35,636 "None" passed to -e/--elts, running without TE reference
2022-05-24 14:49:36,409 writing clusters to results/OF1.tldr/JAAVVJ010000099.1.pickle
2022-05-24 14:49:37,158 writing clusters to results/OF1.tldr/JAAVVJ010009971.1.pickle
2022-05-24 14:49:39,252 writing clusters to results/OF1.tldr/CM025019.1.pickle
...
2022-05-24 14:52:07,881 writing clusters to results/OF1.tldr/JAAVVJ010009963.1.pickle
2022-05-24 14:52:08,399 loaded 504 clusters from results/OF1.tldr/CM025008.1.pickle
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/spack/apps2/linux-centos7-x86_64/gcc-11.2.0/python-3.9.6-5amy32qig2nbj7ti7ehht3y2vbmdc2j7/lib/python3.9/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/home1/malonema/.local/bin/tldr", line 1525, in process_cluster
qual = qual[::-1]
TypeError: 'NoneType' object is not subscriptable
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home1/malonema/.local/bin/tldr", line 2128, in
main(args)
File "/home1/malonema/.local/bin/tldr", line 1907, in main
processed_clusters.append(res.get())
File "/spack/apps2/linux-centos7-x86_64/gcc-11.2.0/python-3.9.6-5amy32qig2nbj7ti7ehht3y2vbmdc2j7/lib/python3.9/multiprocessing/pool.py", line 771, in get
raise self._value
TypeError: 'NoneType' object is not subscriptable

Any idea what might be causing this?

@malonematt
Copy link
Author

From what I can tell this might be happening because on line 1521 qual is set to read.qual:

`for read in bam.fetch(cluster.chrom(), out_start, out_end):
if not read.is_secondary and not read.is_supplementary:
seq = read.seq
qual = read.qual

                                if read.is_reverse:
                                    seq = rc(seq)
                                    qual = qual[::-1]`

The only other mention of read.qual is on line 1650, when ins_read is being defined:

ins_read = InsRead(bam.filename.decode(), read.reference_name, q_start, q_end, r_start, r_end, read.qname, read.seq, read.qual, read.mapq, is_ins, is_clip, clip_end, phase)

I'm still very new to python, is this issue caused because read.qual has not been defined yet?

@malonematt
Copy link
Author

I checked my bam with samtools view file.bam, and they have quality scores (or at least some do, I havent checked if every aligned read does yet). Could it be that some of the reads don't have quality scores?

@adamewing
Copy link
Owner

Hi, sorry for the delay. It's likely one or more read alignment records are misssing quality scores (and sequences) as I've seen this come up in other software with minimap2 .bams.

I've pushed a fix that will skip the offending alignments at that point and complain about it a bit so you can track down the read if you like: ae3cdb8

It's possible you'll hit this elsewhere in the code though so let me know if it comes up again.

Regarding your q about read.qual, that's set by pysam when the read is parsed into their AlignedSegment class (if I have the name right).

@malonematt
Copy link
Author

I just ran it after double-checking that the code was updated with your fix and it still threw the same error:

2022-05-25 18:14:48,448 loaded 504 clusters from results/OF1.tldr/CM025008.1.pickle
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/spack/apps2/linux-centos7-x86_64/gcc-11.2.0/python-3.9.6-5amy32qig2nbj7ti7ehht3y2vbmdc2j7/lib/python3.9/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/home1/malonema/.local/bin/tldr", line 1525, in process_cluster
qual = qual[::-1]
TypeError: 'NoneType' object is not subscriptable
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home1/malonema/.local/bin/tldr", line 2128, in
main(args)
File "/home1/malonema/.local/bin/tldr", line 1907, in main
processed_clusters.append(res.get())
File "/spack/apps2/linux-centos7-x86_64/gcc-11.2.0/python-3.9.6-5amy32qig2nbj7ti7ehht3y2vbmdc2j7/lib/python3.9/multiprocessing/pool.py", line 771, in get
raise self._value
TypeError: 'NoneType' object is not subscriptable

So I guess it wasn't that problem?

@malonematt
Copy link
Author

Although, now that I'm going through it, that qual = qual[[::-1]] line isn't on line 1525 in the updated code .. it's on 1529.

So maybe it's just still running the old tldr.

@malonematt
Copy link
Author

Yup, the one in my actual conda directory didn't update. Which I find confusing since I did a fresh install. But I'll just update it by hand. I'll let you know if this fixes the problem.

@malonematt
Copy link
Author

So that fix was able to resolve that error, but now it generates a different error:

2022-05-26 02:48:27,604 skipped a read without seq/qual: ed5106b4-53fa-4f23-85ed-f1720a965b0d
2022-05-26 02:48:27,604 skipped a read without seq/qual: dbfb5709-60e4-4cbd-abfc-0971901cbcaf
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/spack/apps2/linux-centos7-x86_64/gcc-11.2.0/python-3.9.6-5amy32qig2nbj7ti7ehht3y2vbmdc2j7/lib/python3.9/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/home1/malonema/.local/bin/tldr", line 1536, in process_cluster
cluster.spanning_non_supporting_reads(int(args.wiggle), int(args.min_te_len))
File "/home1/malonema/.local/bin/tldr", line 399, in spanning_non_supporting_reads
for r in bam.fetch(self.chrom(), te_ins_start, te_ins_end):
File "pysam/libcalignmentfile.pyx", line 1091, in pysam.libcalignmentfile.AlignmentFile.fetch
File "pysam/libchtslib.pyx", line 690, in pysam.libchtslib.HTSFile.parse_region
ValueError: start out of range (-271)
"""

The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home1/malonema/.local/bin/tldr", line 2132, in
main(args)
File "/home1/malonema/.local/bin/tldr", line 1911, in main
processed_clusters.append(res.get())
File "/spack/apps2/linux-centos7-x86_64/gcc-11.2.0/python-3.9.6-5amy32qig2nbj7ti7ehht3y2vbmdc2j7/lib/python3.9/multiprocessing/pool.py", line 771, in get
raise self._value
ValueError: start out of range (-271)

It looks like these are unrelated, so let me know if you'd like me to open a separate issue for it.

@malonematt
Copy link
Author

I found a somewhat similar issue here:
ComputationalSystemsBiology/ExoProfiler#6

In this one it's that a region near the start of a chromosome is extended beyond the start, resulting in a negative value. Which upsets pysam.

Do you think --extend_consensus could be causing the same problem?

@malonematt
Copy link
Author

Looks like you ran into this problem in:
#8
I'll go look at your fix to see if that helps me figure things out

@malonematt
Copy link
Author

I did try another run without --extend_consensus (but still with --detail_output) and it threw the same error

@malonematt
Copy link
Author

I think what's happening is in this section of code
` for bampath in set([read.bampath for read in self.reads if read.useable]):
bamname = '.'.join(os.path.basename(bampath).split('.')[:-1])
bam = pysam.AlignmentFile(bampath)

        te_ins_start = int(self.breakpoints[0])
        te_ins_end   = int(self.breakpoints[1])

        for r in bam.fetch(self.chrom(), te_ins_start, te_ins_end):
            if r.is_secondary or r.is_supplementary:
                continue

`
Where I think te_ins_start is getting assigned that negative number.
Just a guess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants