Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

investigate missing annotations from MGI imports pipeline - EXP "GO_Central", and "GOC" annotations should not have been exluded from the load #38

Closed
sierra-moxon opened this issue Feb 27, 2024 · 8 comments
Assignees

Comments

@sierra-moxon
Copy link
Member

from Li via:
geneontology/go-site#2043

IKR examples:
UniProtKB P60154 Rnase9 NOT|enables GO:0004540 PMID:15676279 IKR F Inactive ribonuclease-like protein 9 Rnase9 protein taxon:10090 20150710 GO_Central
UniProtKB Q5GAM7 Rnase13 NOT|enables GO:0004540 PMID:15676279 IKR F Probable inactive ribonuclease-like protein 13 Rnase13 protein taxon:10090 20150710 GO_Central
@sierra-moxon
Copy link
Member Author

SMoxon@SMoxon-M82 gopreprocess % grep "UniProtKB:P60154" mgi-src.gpi      
MGI:MGI:3057273	Rnase9	ribonuclease, RNase A family, 9 (non-active)		SO:0001217	NCBITaxon:10090				UniProtKB:P60154	
PR:P60154	mRNASE9	inactive ribonuclease-like protein 9 (mouse)	mRNASE9	PR:000000001	NCBITaxon:10090	MGI:MGI:3057273			UniProtKB:P60154

this might have to do with a non-1:1 mapping..investigating

@sierra-moxon
Copy link
Member Author

sierra-moxon commented Feb 27, 2024

the original annotation in GOA for the first IKR example is:

UniProtKB	P60154	Rnase9	NOT|enables	GO:0004540	PMID:15676279	IKR		F	Inactive ribonuclease-like protein 9	Rnase9	protein	taxon:10090	20150710	GO_Central		

it has an assigned_by of GO_Central ... there was a requirement for this ingest to ignore annotations in protein to GO that are from "MGI", "GO_Central", or "GOC".

I think my script is behaving as designed, so I guess the next question is: if GO_Central provides this, we need to figure out where? (maybe it's in a different form or to a different ID or included in some other ingest? - asking around on the MGI side - do you have any insight @kltm ?)

@sierra-moxon sierra-moxon self-assigned this Feb 27, 2024
@sierra-moxon
Copy link
Member Author

The IKR issue that worries me is that in the GOA annotation records the provider as "GO_Central", however, the final GPAD and GAF produced by this test pipeline are missing these annotations.

Tracing the annotation, it seems like it's something like this:
GO_Central (?in which GO internal pipeline or annotation step is this originally generated?) -> Protein2GO -> GoPreprocess (skip it) -> GO Pipeline -> final GPAD/GAF (missing)

So where is the original location of this annotation? (e.g. noctua, some sort of external ingest to noctua, etc.)

@kltm
Copy link
Member

kltm commented Feb 27, 2024

To clarify a concrete example for the sake of @kltm
UniProtKB Q5GAM7 Rnase13 NOT|enables GO:0004540 PMID:15676279 IKR F Probable inactive ribonuclease-like protein 13 Rnase13 protein taxon:10090 20150710 GO_Central
This annotation is currently found in GO output products. However, given the rules of the (in progress) MGI pre-process import step from Protein2GO, anything that has source GO_Central is dropped. With this, our ingest is behaving as expected. The solutions would be to either 1) change the source in the p2go upstream or 2) change the rules if this is important.

@kltm
Copy link
Member

kltm commented Feb 27, 2024

@LiNiMGI @ukemi how should we proceed on #38 (comment) ?

@kltm kltm assigned ukemi and LiNiMGI and unassigned sierra-moxon Feb 27, 2024
@LiNiMGI
Copy link
Collaborator

LiNiMGI commented Feb 29, 2024

@sierra-moxon @kltm
We looked further into the GOA_mouse file:

All annotations with "assigned by:GO_Central" are:
GO_REF:0000033 IBAs/PAINT annotations
And 66 annotations that were mostly made or modified by Pascale with evidence code: IDA, IMP, IPI, IKR, ISS.

MGI think we should include them in the load.

So maybe we need to modify the rules of import to:
only exclude GO_REF:0000033 with "assigned by:GO_Central"
or
include IDA, IMP, IPI, IKR, ISS annotations with "assigned by:GO_Central"

Thanks,
Li

@sierra-moxon
Copy link
Member Author

sierra-moxon commented Feb 29, 2024

from managers call:
we thought the only thing coming was PAINT annotations but it turns out we get these from other places too

action: new rule: only exclude GO_REF:0000033 with "assigned by:GO_Central"

We think these are PAINT, but manual PAINT and GO_Central is a curator in protein2GO - these are experimental annotations from Pascale. Could also be some experimental evidence codes from other curators.

@pgaudet pgaudet changed the title investigate missing IKR annotations from MGI imports pipeline investigate missing annotations from MGI imports pipeline - "GO_Central", and "GOC" annotations should not have been exluded from the load Mar 13, 2024
@pgaudet pgaudet changed the title investigate missing annotations from MGI imports pipeline - "GO_Central", and "GOC" annotations should not have been exluded from the load investigate missing annotations from MGI imports pipeline - EXP "GO_Central", and "GOC" annotations should not have been exluded from the load Mar 13, 2024
@pgaudet pgaudet moved this from In Progress to 2014-03-18 Sprint tasks in Integrate remainder of MGI pipeline into the GO pipeline Mar 15, 2024
@sierra-moxon
Copy link
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants