-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate ML features with Galaxy #30
Comments
@SantaMcCloud can you wrap: https://github.com/raw-lab/mercat2 ? |
Yes will do it at the weekend quick! |
@paulzierep it might seem that mercat2 maybe has some bugs when running it via docker which means that it might take a while to finish the wrapper. I open an issue to check if the errors are correct or if therer is anything which needed to be fixed: raw-lab/mercat2#14 |
Maybe we can use this tool instead: https://github.com/refresh-bio/KMC |
But we need to add it to bioconda |
Wrong it is there: https://github.com/bioconda/bioconda-recipes/blob/master/recipes/kmc/meta.yaml |
Maybe we could apply the diversity estimation of mercat2 on the kmers produced by kmc ... that should be doable. |
@paulzierep i will add this tool this weekend then. Mercat2 did respond on the issue i create so the bugs can be fixed the next few weeks! |
well if the new tools works, I think we do not need the other, maybe a small script instead to allow to compute diversity...but maybe you could check if KMC dump works on a small dataset locally first and add a snippet here ? |
Okay i check the tool and how it it works. This are the option to run it:
For this tool either one file at each run can be used or you can can give it a list where the path of each file is stated @paulzierep do like both option or do you prefer either single/multple only? The output are 2 binary files: This 2 files then can be used for the other functions which are:
To greate a list where each kmer is listed with the number how ofter it is appear we need the tool
with this we can get the, in this example. the dump file where evrey kmer is listed. Now my question to you @paulzierep should i include evreything or just the basic in which we are interested for the workflow? In this case the dump file and maybe the histogram file? Example dump file (snippet):
Example histogram file (complete file):
|
I am currently testing https://usegalaxy.eu/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Ffastk_fastk%2Ffastk_fastk%2F1.1.0%2Bgalaxy2&version=latest which we already got in galax y, seems super fast and worked so far ... |
For https://github.com/refresh-bio/KMC I do not get why the dump file has all 255 ? Any idea. |
okay if this not work let me know then i start to wrap https://github.com/refresh-bio/KMC
currently now but i can have a look into the issue or ask them |
It seems that there is no information about this so only way to find out is to open a issue to find out why |
The text was updated successfully, but these errors were encountered: