-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GFF3::seqID #12
Comments
I believe the genomic position is the best option there. miRNAs that map to multiple locations can be designated with the additional numbering as you suggested. |
Thanks, I'll keep open, but I'll move to next question. |
In our own miRNA-Seq analyzing tool that we are finalizing (yes, one more tool in a quite already large toolbox...), the way we work around that is by creating "genomic_location" groups of sequences that share the same unique or multiple genomic location origins. We allow a wiggle of [user-defined] nucleotides to group isomiRs together, and sequences can be in the same genomic location group only if they share the same location set. For ex, an isomiR that maps to two genomic locations won't be in the same genomic location group as an isomiR that maps to only one of the two locations. I'm not sure that helps much for this specific task but I think that choosing one genomic location over another when a sequence is as likely to come from one or another is creating a bias issue... In our software, the genomic location ID refers to a series of genomic location with for example: "11:27256137-27256115;5:29390312-29390334" showing two putative locations for an isomiR group with embedded information about the strand too (here the first location is on the reverse strand, while the second location is on the forward strand). |
In my opinion, as Thomas pointed out, embedding information about the strand is critical, as there are many examples of miRNAs that are located in the opposite strands of the same genomic position (e.g. hsa-mir-499a & hsa-mir-499b). |
Dear members of the mirtop project, I've been adding support to a miRNA seq pipeline for outputting in miRGFF3 format, and I'm having doubt concerning this seqID value. I've seen in the examples mentioned here and in the preprint that the precursor ID should be mentioned in this column. What about the mature ID? Could it be mentioned instead? Or would it create compatibility issues when using mirtop? And on a side note, thank you for developing this, I've been struggling with isomiR definition myself, and this will be a very usefull project for the miRNA community! |
Hi @FlorianThibord, Thanks so much for the question. I think we didn't think about this, but it is a valid point. we can try to adapt our tool to be compatible with that. It shouldn't be a lot of work but I would need some test file to work with. Normally we work all the time with the same sequences to test the tool and all the functions we code. Just for curiosity, do you detect isomiRs that are -2nt at 5p the reference sequence? In that case, do you have information about these 2 nts map to the precursor or you just don't look at that? Let me know if this plan will work with you and I will send you the sequences I need to have in the GFF3 format you are producing where the seqID is the mature one. Thanks! :) PS: You are welcome to join if you want to be more involved, let me know! |
Thanks @lpantano for your reply, Or course, I'll gladly produce some test files in that format if it will help. And sure I'd be happy to bring my modest contribution to the project! |
Perfect. It has the standard illumina adapter:TGGAATTCTCGGGTGCCAAGGAACTC Can you tell me the affiliation you want to use to join the team? Thanks |
Great I'll get working on it asap. |
Hi, Concerning my affiliation: Florian Thibord, Phd student. INSERM UMR_S 1219, Bordeaux Population Health Research Center, University of Bordeaux, Bordeaux, France |
Thanks for doing this. I think is almost perfect. I have a couple of requests only: The version in the file is correct but the UID is from version 1.0. We moved to a more commonly used id by Mintplate. Any way could use the Other minor details:
After that, it would be pretty easy to integrate this into mirtop! Thanks again! |
Hi, thanks for the feedback. |
Hey @FlorianThibord Did you have a chance to update the UID? if not, you can remove it and I will adapt mirtop to be compatible with that as far as you add the sequence to the line. Thanks! |
Hi @lpantano |
Hi @FlorianThibord , I think is almost there. I noticed a couple of typos:
Thanks a bunch! we are almost there. |
Thanks @lpantano , |
Hi all again!
cc: @lpantano @gurgese @ThomasDesvignes @mhalushka @mlhack @keilbeck @BastianFromm @ivlachos @TJU-CMC
I will start a issue column type at a time. Let's see if that makes easy to get as least a few people commenting.
The first column is for chromosome ID. That brings the discussion whether we should use genomic position or precursor position. Or allow both if in the header we can get the exact version-database for the precursor.
I am starting to be more incline to use genomic position because that should be the same among databases. With the condition to have the hairpin as a parent feature in the file as well. It will be like:
The only think I am not clear is what we do with the miRNAs that have multiple precursor on the genome. I can only think about adding an attribute like
other_parents=hairpin2,hairpin3...
and those parents should be in the GFF3 file as well.Please comment with new ideas, or if you agree, disagree, missing scenario I am missing...
Thanks!
The text was updated successfully, but these errors were encountered: