Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pyarrow error #135

Open
Psy-Fer opened this issue Jul 4, 2024 · 10 comments
Open

Pyarrow error #135

Psy-Fer opened this issue Jul 4, 2024 · 10 comments

Comments

@Psy-Fer
Copy link

Psy-Fer commented Jul 4, 2024

Hey George,

I have a user getting a strange error. I've attached the issue below, where you can also see some more context.

Any ideas what the issue here might be?

Cheers
James


dear James,

thank you for the update, I just tried the newer version. I am getting an error related to the pyarrow package:
trace

   len(batch.signal[batch_row_index].as_buffer()),
AttributeError: 'pyarrow.lib.LargeListScalar' object has no attribute 'as_buffer'

Originally posted by @lborcard in Psy-Fer/blue-crab#12 (comment)

@0x55555555
Copy link
Collaborator

Based on the error i suspect the file is uncompressed (and hitting an unaccounted for error)... I'm not sure how its possible to end up with an uncompressed file - how were the files created?

I'll keep digging on my side.

@lborcard
Copy link

lborcard commented Jul 4, 2024

If may intervene, i am the user with the error. The pod5 files were generated using Icarust https://github.com/LooseLab/Icarust . They are compatible with dorado (I used it to basecall them).

@0x55555555
Copy link
Collaborator

Ok, I'm not familiar with how Icarust writes pod5 files, but I've completed investigating in the pod5 source and found it is due to a bug with uncompressed pod5 files and the python pod5 bindings.

I have a fix internally that will resolve the issue, and I'll get it out asap.

  • George

@Psy-Fer
Copy link
Author

Psy-Fer commented Jul 4, 2024

This makes me ask the obvious question as well. Is pore_type still not used by nanopore software?

I was under the impression here that minknow had started using it. Is this something icarust has decided to use but is not actually a field used yet?

@0x55555555
Copy link
Collaborator

Sequencing runs on the current MinKNOW software do not set the pore type no

@Psy-Fer
Copy link
Author

Psy-Fer commented Jul 4, 2024

Hmmm okay. Thanks.

@Adoni5
Copy link

Adoni5 commented Jul 4, 2024

Ahh okay - @Psy-Fer I'm happy to change the Icarust code to set the Pore Type to "not-set" if that would be useful.

@Psy-Fer Psy-Fer changed the title Hey George, I have a user getting a strange error. I've attached the issue below, where you can also see some more context. Any ideas what the issue here might be? Cheers James ---- Pyarrow error Jul 4, 2024
@Psy-Fer
Copy link
Author

Psy-Fer commented Jul 4, 2024

Please make it specifically not_set with an underscore to match that of the current pod5 output.

Feel free to use the test scripts in blue-crab as boilerplate to test if your files are correct.

I'll leave in the R10.4.1 exception to the pore_type so users of older versions of icarust can convert files if they like.

James

@0x55555555
Copy link
Collaborator

I'm in the process of deploying 0.3.12, which contains a fix for the issue of opening raw data from uncompressed pod5 files.

Thanks,

  • George

@Adoni5
Copy link

Adoni5 commented Jul 8, 2024

Thanks George.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants