demultiplexing fast5 #46

aspitaleri · 2021-03-09T22:28:03Z

Hi
From #14 It seems that MotifSeq could demultiplex a bundle of fast5 to retrieve the fast5 for each strains. Is this right? Or there is some options in Squiggle Kit to do the same?
Thanks

Psy-Fer · 2021-03-11T22:53:26Z

It depends how you are demultiplexing, and what you mean by strains?

More information is required for me to answer your question.

aspitaleri · 2021-03-11T23:03:39Z

Hi
basically I have a bundle of fast5 files from a MinIon run which includes sequencing from different bacterial strains (i.e. samples). Normally, I do basecall and then demultiplex using guppy on the fastq. Now, I'd like to perform directly on the fast5 the demutliplex, so divide them per barcode without passing through the basecall.
Hope it is clear.

Psy-Fer · 2021-03-11T23:28:18Z

oooh right. fast5_fetcher_multi paired with ont_fast5_api would be the tool for that.

for each barcodeXX.fastq file do something like this

mkdir dmux_barcode01_single

# extract the individual fast5 files
python3 fast5_fetcher_multi -q barcode01.fastq -s sequencing_sumary.txt -m /path/to/fast5s/ -o ./dmux_barcode01_single/

# package the individual files up again (I really should just do this in fast5_fetcher...one day)
single_to_multi_fast5 -i dmux_barcode01_single/ -o dmux_barcode01_multi --filename_base barcode01

# remove intermediate fast5 files
rm -r dmux_barcode01_single/

This will used the readIDs in the demultiplexed fastq file, match them with the fast5 filenames in the sequencing summary, and find them in the path given with -m and saved to directory -o. The output directory should be made before running fast5_fetcher_muilti

Then the ont_fast5_api has a script called single_to_multi_fast5 which will pack the fast5s extracted into multi files again.

Note, that if you are on a system with hard file number limits, like a HPC, check how many reads are in each barcodeXX.fastq file, as each read will make 1 fast5 file. So you could hit limits. If that is an issue, you can split the file up and run in parts. Or extract the readIDs manually and use ont_fast5_api only to extract the reads.

I hope this helps.

aspitaleri · 2021-03-11T23:37:55Z

Right. So there is not possibility to avoid to go through basecalling/demultiplexing first, without using fastq files. Actually MinIon makes a sequencing_summary.txt during the run when generating fast5 only. Could I use that file to call each reads per barcode.

Psy-Fer · 2021-03-11T23:56:07Z

Ahh, well the only DNA signal level barcode out there I know of is Deepbinner. But it's depricated now if I remember correctly.

Motifseq isn't sensitive enough to do it as well as base level demultiplexing.

So no, there isn't really an easy way to avoid basecalling.

aspitaleri · 2021-03-12T00:11:02Z

So the approach described here https://psy-fer.github.io/SquiggleKitDocs/MotifSeq/#background in the Nanopore adapter identification is not useful for this.

Psy-Fer · 2021-03-12T00:30:54Z

It would work, yes, but not as effectively as a base level derived demultiplexer. Only a system using some form of machine learning/learning like used in Deepbinner or what we have done with deeplexicon, would get similar or better results.

Is there a particular reason to do this? Perhaps there is another solution.

aspitaleri · 2021-03-12T00:39:20Z

Well, my purpose is to bypass the basecalling in order to reduce one source of error and then use uncalled pipeline (https://github.com/skovaka/UNCALLED) to map fast5 on genome reference, i.s. amplicon analysis. That's why I need to demultiplex a MinIon run in the different barcodes before to map it.

Psy-Fer · 2021-03-12T00:52:00Z

Uncalled uses the Readuntil api, are you planning to do the demultiplexing in real time? Or are you looking to run uncalled after a run?

The accuracy of uncalled is not as good as basecalling and aligning, as the base sequence it uses is only an approximation.

aspitaleri · 2021-03-12T00:55:48Z

The idea is to run it after run on amplicons so on huge depth (>4000), and then compare with standard procedure to check whether the approach is feasible of course. Thanks for your comments

Psy-Fer · 2021-03-12T01:51:47Z

If you want to benchmark to see how well it does, you can use the regular demultiplexing data to split the uncalled data output and assess that way. Then if it is better, look into the demultiplexing with signal.

There is a possibility for me to extend deeplexicon algorithms to DNA, rather than just RNA.

aspitaleri · 2021-03-12T08:58:51Z

Let me see if I understood well. Basecall/demultiplex the fast5 using i.e. guppy. Then as you suggested #46 (comment) get the fast5 per barcode using the sequencing_summary and then use the uncalled pipeline to get the fasta. Finally compare the results. Right?

Psy-Fer · 2021-03-12T09:11:40Z

Sounds about right yea. Plus the fiddly bits in between. Good luck!

aspitaleri · 2021-03-12T09:32:19Z

Yep! I will update you how it does.
In case it is better, we need then to think how to avoid the step of basecalling ... but this in another story.
Thanks a lot for you help and comments

Psy-Fer · 2021-03-12T09:38:27Z

You are welcome.

If it is the case, I'll build a demultiplexer

aspitaleri · 2021-03-12T09:41:23Z

Uaooo - that sounds great really. Keep in touch then!

aspitaleri · 2021-03-12T09:43:33Z

I see indeed that you have similar but for RNA
https://github.com/Psy-Fer/deeplexicon. Good to know

Psy-Fer · 2021-03-30T06:26:04Z

Yes.

I'm going to extend that to DNA. Planning to have something in a few months.

aspitaleri · 2021-03-30T07:16:18Z

That's great! I will wait for your tool. If you need to debug before to release it - I will be happy to do it.

Psy-Fer self-assigned this Mar 11, 2021

Psy-Fer added the question Further information is requested label Mar 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

demultiplexing fast5 #46

demultiplexing fast5 #46

aspitaleri commented Mar 9, 2021

Psy-Fer commented Mar 11, 2021

aspitaleri commented Mar 11, 2021

Psy-Fer commented Mar 11, 2021

aspitaleri commented Mar 11, 2021

Psy-Fer commented Mar 11, 2021

aspitaleri commented Mar 12, 2021

Psy-Fer commented Mar 12, 2021 •

edited

Loading

aspitaleri commented Mar 12, 2021

Psy-Fer commented Mar 12, 2021

aspitaleri commented Mar 12, 2021

Psy-Fer commented Mar 12, 2021

aspitaleri commented Mar 12, 2021

Psy-Fer commented Mar 12, 2021

aspitaleri commented Mar 12, 2021

Psy-Fer commented Mar 12, 2021

aspitaleri commented Mar 12, 2021

aspitaleri commented Mar 12, 2021

Psy-Fer commented Mar 30, 2021

aspitaleri commented Mar 30, 2021

demultiplexing fast5 #46

demultiplexing fast5 #46

Comments

aspitaleri commented Mar 9, 2021

Psy-Fer commented Mar 11, 2021

aspitaleri commented Mar 11, 2021

Psy-Fer commented Mar 11, 2021

aspitaleri commented Mar 11, 2021

Psy-Fer commented Mar 11, 2021

aspitaleri commented Mar 12, 2021

Psy-Fer commented Mar 12, 2021 • edited Loading

aspitaleri commented Mar 12, 2021

Psy-Fer commented Mar 12, 2021

aspitaleri commented Mar 12, 2021

Psy-Fer commented Mar 12, 2021

aspitaleri commented Mar 12, 2021

Psy-Fer commented Mar 12, 2021

aspitaleri commented Mar 12, 2021

Psy-Fer commented Mar 12, 2021

aspitaleri commented Mar 12, 2021

aspitaleri commented Mar 12, 2021

Psy-Fer commented Mar 30, 2021

aspitaleri commented Mar 30, 2021

Psy-Fer commented Mar 12, 2021 •

edited

Loading