-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gene ontology cross-annotation mapping #77
Comments
Hi @rsiani , thanks for the kind feedback! |
Hi @oschwengers , apologies for sounding confusing. I will try to elaborate. So, let's take that I run the program as this: and among the results I get my nicely tabulated list of annotated sequences: Now, the column "DbXrefs" contains the identifier for each of the databases that returned a hit, if I understood correctly. |
Thanks for the explanation, now I understand ... In conclusion, I see both the need and potential of such a feature but also the comprehensive considerations and efforts it would take. Therefore, I doubt that this could be done in the near future. However, I'm open to all sorts of thoughts and discussions. Maybe one could start to implement a set of complementary scripts to retrieve only a tiny set of the metadata/information for UniRef90. Based on your result file, could you provide an example of information you'd like to collect from which sources? |
Exactly! And I totally agree with you, with the wealth of databases around it would be a huge loss to only rely on a single database, but as soon as you start using more, the task of integrating the knowledge gets really complicated. |
Sounds great - thanks! |
In that sense, if i wanted to know how many members of each COG family i have, would it be enough to just parse the json or the tsv file to count those or would you recomend re-mapping? |
Congrats on the release. I already tried bakta on a couple genomes I was studying and the results are really good, without being much slower than prokka. (output also works fine with Roary).
Something that was also missing from Prokka and that I always wanted to get from quick annotations is clustering the genes in functional categories(GO-style, but also KEGG and Pfam have similar features). This could be probably done by mapping against GO annotations, but since you use several different databases the process seems quite convoluted. Any idea for a quick and dirty workaround?
The text was updated successfully, but these errors were encountered: