-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
demultiplexing fast5 #46
Comments
It depends how you are demultiplexing, and what you mean by strains? More information is required for me to answer your question. |
Hi |
oooh right. fast5_fetcher_multi paired with ont_fast5_api would be the tool for that. for each barcodeXX.fastq file do something like this
This will used the readIDs in the demultiplexed fastq file, match them with the fast5 filenames in the sequencing summary, and find them in the path given with Then the ont_fast5_api has a script called single_to_multi_fast5 which will pack the fast5s extracted into multi files again. Note, that if you are on a system with hard file number limits, like a HPC, check how many reads are in each barcodeXX.fastq file, as each read will make 1 fast5 file. So you could hit limits. If that is an issue, you can split the file up and run in parts. Or extract the readIDs manually and use ont_fast5_api only to extract the reads. I hope this helps. |
Right. So there is not possibility to avoid to go through basecalling/demultiplexing first, without using fastq files. Actually MinIon makes a sequencing_summary.txt during the run when generating fast5 only. Could I use that file to call each reads per barcode. |
Ahh, well the only DNA signal level barcode out there I know of is Deepbinner. But it's depricated now if I remember correctly. Motifseq isn't sensitive enough to do it as well as base level demultiplexing. So no, there isn't really an easy way to avoid basecalling. |
So the approach described here https://psy-fer.github.io/SquiggleKitDocs/MotifSeq/#background in the Nanopore adapter identification is not useful for this. |
It would work, yes, but not as effectively as a base level derived demultiplexer. Only a system using some form of machine learning/learning like used in Deepbinner or what we have done with deeplexicon, would get similar or better results. Is there a particular reason to do this? Perhaps there is another solution. |
Well, my purpose is to bypass the basecalling in order to reduce one source of error and then use uncalled pipeline (https://github.com/skovaka/UNCALLED) to map fast5 on genome reference, i.s. amplicon analysis. That's why I need to demultiplex a MinIon run in the different barcodes before to map it. |
Uncalled uses the Readuntil api, are you planning to do the demultiplexing in real time? Or are you looking to run uncalled after a run? The accuracy of uncalled is not as good as basecalling and aligning, as the base sequence it uses is only an approximation. |
The idea is to run it after run on amplicons so on huge depth (>4000), and then compare with standard procedure to check whether the approach is feasible of course. Thanks for your comments |
If you want to benchmark to see how well it does, you can use the regular demultiplexing data to split the uncalled data output and assess that way. Then if it is better, look into the demultiplexing with signal. There is a possibility for me to extend deeplexicon algorithms to DNA, rather than just RNA. |
Let me see if I understood well. Basecall/demultiplex the fast5 using i.e. guppy. Then as you suggested #46 (comment) get the fast5 per barcode using the sequencing_summary and then use the uncalled pipeline to get the fasta. Finally compare the results. Right? |
Sounds about right yea. Plus the fiddly bits in between. Good luck! |
Yep! I will update you how it does. |
You are welcome. If it is the case, I'll build a demultiplexer |
Uaooo - that sounds great really. Keep in touch then! |
I see indeed that you have similar but for RNA |
Yes. I'm going to extend that to DNA. Planning to have something in a few months. |
That's great! I will wait for your tool. If you need to debug before to release it - I will be happy to do it. |
Hi
From #14 It seems that MotifSeq could demultiplex a bundle of fast5 to retrieve the fast5 for each strains. Is this right? Or there is some options in Squiggle Kit to do the same?
Thanks
The text was updated successfully, but these errors were encountered: