Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consensus on naming scheme #19

Closed
GallVp opened this issue Apr 18, 2024 · 3 comments
Closed

Consensus on naming scheme #19

GallVp opened this issue Apr 18, 2024 · 3 comments
Assignees
Labels
discussion needed Further discussion is needed
Milestone

Comments

@GallVp
Copy link
Member

GallVp commented Apr 18, 2024

Current scheme: AGAT

  • Ross's Perl script (gene.t1, gene.t2, etc.)
  • mRNA or transcript (GFF3 requires mRNA)
  • liftoffID is acceptable (Ross)
  • For a gene with multiple different descriptions=differing%20isoform%20descriptions
@GallVp GallVp added the discussion needed Further discussion is needed label Apr 18, 2024
@GallVp
Copy link
Member Author

GallVp commented Apr 18, 2024

Summary of a call with @rosscrowhurst

  • NCBI requires that the text in 'product' attribute adheres to a set of rules. These rules keep changing and an automated validation tool is not known.
  • JBrowse2 picks the 'description' or 'note' attribute from the gene feature to display as the annotation text. This capability is a high priority for us because many fairGenomes/JBrowse2 users have requested it. Therefore, we should populate the 'description' attribute for the gene features.
  • We use eggnogmapper to obtain functional annotations for transcripts. These annotations should be stored both at the transcript and the gene level under the 'description' attribute.
  • An experimental feature of pangene is to support multiple isoforms. This might be dropped later. If a gene has multiple isoforms and and they have different functional annotations, this indicates a likely problem with gene prediction. In such a case, the gene level 'description' will be 'differing isoform descriptions'
  • To avoid pesky formatting failures, we should use url encoding. Thus, the above description will be stored as: description=differing%20isoform%20descriptions

@GallVp GallVp added this to the 0.4 milestone Apr 23, 2024
@GallVp
Copy link
Member Author

GallVp commented Apr 29, 2024

Notes:

  • @jasonshiller, @rosscrowhurst, @CeciliaDeng Global transcript numbers are confusing. BRAKER uses t1, t2 and that's what we should use. Convention: geneXX.tYY
  • @rosscrowhurst A single naming scheme might not work for every case and every user
  • Provide transformation tables/web apps for pan-genome

@GallVp GallVp self-assigned this Jun 20, 2024
@GallVp
Copy link
Member Author

GallVp commented Oct 6, 2024

@GallVp GallVp closed this as completed Oct 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion needed Further discussion is needed
Projects
None yet
Development

No branches or pull requests

1 participant