Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update main pipeline output to produce usable GPAD/GPI 2.0 #2043

Closed
kltm opened this issue Aug 17, 2023 · 176 comments
Closed

Update main pipeline output to produce usable GPAD/GPI 2.0 #2043

kltm opened this issue Aug 17, 2023 · 176 comments

Comments

@kltm
Copy link
Member

kltm commented Aug 17, 2023

This was created from a conversation @ukemi and @sierra-moxon , making explicit an implicit task.

geneontology/gopreprocess#9

@ukemi
Copy link
Contributor

ukemi commented Oct 25, 2023

Location of @sierra-moxon's files

@sierra-moxon
Copy link
Member

first pass of merged GPAD (all noctua MGI annotations from current.geneontology.org + all preprocessed/upstream annotations produced in this mini-pipeline in a GPAD 2.0 file):
https://drive.google.com/drive/folders/1aZxvumsODSvXGbk_gMdFtuGhhAq4MKdL

I already see two issues: taxon isn't coming through the conversion for some of the rows, and some of the rows were labeled as provided_by MGI -- I think both of these issues come from the GAF->GPAD step and not as a result of the underlying GAF generation, but I am confirming.

@leemdi
Copy link

leemdi commented Nov 9, 2023

@sierra-moxon

I am seeing !gpa-version: 1.2 at the end of the merged_gpad_11_08_2023.txt.
should this file just contain 2.0?

thanks.
Lori

@ukemi
Copy link
Contributor

ukemi commented Nov 9, 2023

I am looking at the errors that Lori's load threw:

  1. We were missing an RO identifier that I have added to MGI
  2. There are several GO-REFS that we don't have in MGI. I will look at those closely and most likely add them.
  3. We don't have any of the Reactome references. I need to figure out what to do with those. I will need to track down how we handle those annotations.
  4. Lori says the load is filtering out a lot of duplicates. This doesn't surprise me.

Today I will also just do a sanity check on @sierra-moxon's file.

@ukemi
Copy link
Contributor

ukemi commented Nov 9, 2023

Notes from yesterday's group call.

  • Sierra made the file available to Lori for a first-run test load. See above comments
  • @kltm was concerned about the GAF-GPAD file conversions. In particular he was very concerned that Users and software is using the GPAD files in isolation from GPI files. This means that for any annotations that are to an object that is not a gene, no relationship to the gene can be made. After a long discussion, it was decided that this was not an issue specific to this project, but was a bigger GOC issue and should be brought up on a managers' call.

@ukemi
Copy link
Contributor

ukemi commented Nov 9, 2023

Hi @sierra-moxon. I have a couple of questions to reassure myself that I didn't just think things without actually putting them into the requirements.

  1. We are not appending annotations to the mega-file when the UniProt identifier didn't map to an MGI gene identifier, but we would like a report of those that didn't.
  2. We are processing the IEA annotations for mouse that would be in the non-isoform file consumed in pipeline #329. We filter out the IEAs in the ISO loads.

@ukemi
Copy link
Contributor

ukemi commented Nov 9, 2023

After a bit of investigation:

  1. It looks like the Reactome annotations are being filtered at MGI. THese lines are from @leemdi 's unresolvedB.error
    UniProtKB P01723 P01723 located_in GO:0005886 Reactome:R-MMU-983702 TAS C Ig lambda-1 chain V region protein taxon:10090 20120109 Reactome
    UniProtKB P01843 P01843 located_in GO:0005886 Reactome:R-MMU-983702 TAS C Ig lambda-1 chain C region protein taxon:10090 20120109 Reactome
    Although I only see 54 annotations in this error file, even though there are more than a thousand hits in the incoming gaf. @leemdi can you figure out where the others are being filtered?

  2. It looks like the IEAs are included as the missing refs are for methods that we didn't previously run at MGI. See my analysis here:
    https://docs.google.com/spreadsheets/d/1LwwN3RgyGsDQfdggczJ34Qu78XV-WtkB1JtdBOHPZw4/edit#gid=0

@ukemi
Copy link
Contributor

ukemi commented Nov 9, 2023

@leemdi and @sierra-moxon
It looks like the GPAD1.2 annotations are the ones from the Noctua output.

@deustp01
Copy link

deustp01 commented Nov 9, 2023

  1. Ig lambda-1 chain V region

@ukemi Could the nature of this immunoglobulin UniProt be causing its own specific problem? UniProt has separate instances for the constant region and the variable region of what occurs in the body, mouse or human, as a single polypeptide encoded by a gene that is not present in the germline but created somatically. The annotation is thus a hack in two ways (both unavoidable, as far as I can tell). First, it represents the full length immunoglobulin chain as a complex of a UniProt C protein and a separate UniProt V protein. Second, that V protein is an arbitrarily chosen single instance because there's no way to represent the diversity of possible V regions in this annotation.

@ukemi
Copy link
Contributor

ukemi commented Nov 9, 2023

Yep! I think you are spot on and this may be the case with the 54 annotations in that report! When I look at the report now that you point this out, all the genes are things like this (immunoglobulin regions, histones). So the one above might be a red herring wrt why all the annotations are failing our load. I also think that we are filtering on our end because we don't have Reactome reactions/pathways as references in MGI. It was in that report that I first detected this. eg:

Invalid Reference/either no pubmed id or no jnum (5): Reactome:R-MMU-1008243
Invalid Reference/either no pubmed id or no jnum (5): Reactome:R-MMU-1013867
Invalid Reference/either no pubmed id or no jnum (5): Reactome:R-MMU-1013873
Invalid Reference/either no pubmed id or no jnum (5): Reactome:R-MMU-111519
Invalid Reference/either no pubmed id or no jnum (5): Reactome:R-MMU-1168790
Invalid Reference/either no pubmed id or no jnum (5): Reactome:R-MMU-1168910

@leemdi
Copy link

leemdi commented Nov 9, 2023

when we process the GOA/Mouse, we save all of the Reactome rows (for which we do not have the reference in MGI) to a goamouse.gaf file. Then we append the goamouse.gaf to the end of our mgi/gaf.

@ukemi
Copy link
Contributor

ukemi commented Nov 9, 2023

Thanks @leemdi ! Yes, I saw that file. @deustp01 we should revisit this at a GO@MGI lab meeting.

@leemdi
Copy link

leemdi commented Nov 9, 2023

@sierra-moxon
you mean: the !gpa-version: 1.2 at the end of the merged_gpad_11_08_2023?

@leemdi
Copy link

leemdi commented Nov 9, 2023

@sierra-moxon

Looks like the ones in the 1.2 format, which I am skipping, are the NOCUTA ones. I think you have mentioned this earlier.
I am skipping them because I changed my code to process version 2.0, not 1.2.

example: Shh GO:0000122

@ukemi
Copy link
Contributor

ukemi commented Nov 9, 2023

@sierra-moxon and @leemdi
I am looking at the list of errors in which Uberon IDs were not converted to EMAPA IDs. For many, I don't see a mapping. But for some I do see an EMAPA xref, but can't find that ID.
Here is the list of errors:
https://docs.google.com/spreadsheets/d/1knEybI3QBkiaKHfBhIasKjOJpAqueDN5055_YKdcUFU/edit#gid=0

Looks like the mapping needs updating on the EMAPA/UBERON end. I have emailed Terry about this and sent her the list.

Terry says she will look at the list and open tickets for new mappings at UBERON. Since she has a much better background in all things anatomy than I do, this is a good plan.

@leemdi
Copy link

leemdi commented Nov 10, 2023

@sierra-moxon

the lines with multiple entries in field 7 contain '"'

this line is OK:
MGI:MGI:1919439 RO:0002327 GO:0005515 PMID:15102471 ECO:0000353 UniProtKB:O35305 2023-09-09 GO_Central

this line has '"' in line 7:
MGI:MGI:1333854 RO:0002327 GO:0005515 PMID:23478294 ECO:0000353 "UniProtKB:O35305,UniProtKB:P24604,UniProtKB:P35991,UniProtKB:Q78T81,UniProtKB:Q8CIH5"

is the '"' surrounding the multiple UniProtKB terms expected? or is this something you want to fix?

@ukemi
Copy link
Contributor

ukemi commented Nov 10, 2023

Hi @leemdi, I believe comma and pipe separated values in the 'with' field are allowed.
https://github.com/geneontology/go-annotation/blob/master/specs/gpad-gpi-2-0.md

@leemdi
Copy link

leemdi commented Nov 10, 2023

@ukemi @sierra-moxon
ok, that seems odd to me. if I'm processing field 7, then I would expect the delimiter, but not a leading/ending ".
but I can get rid of these on our end.

@ukemi
Copy link
Contributor

ukemi commented Nov 10, 2023

Sorry! I missed that point. I don't think there should be a ".

@sierra-moxon
Copy link
Member

Hi @sierra-moxon. I have a couple of questions to reassure myself that I didn't just think things without actually putting them into the requirements.

  1. We are not appending annotations to the mega-file when the UniProt identifier didn't map to an MGI gene identifier, but we would like a report of those that didn't.

yes, I can do that, but I don't have it done yet.

  1. We are processing the IEA annotations for mouse that would be in the non-isoform file consumed in pipeline #329. We filter out the IEAs in the ISO loads.

Yes, I include IEA annotations from GOA, but not from the orthology transformation loads (ISO loads)

@sierra-moxon
Copy link
Member

@sierra-moxon

the lines with multiple entries in field 7 contain '"'

this line is OK: MGI:MGI:1919439 RO:0002327 GO:0005515 PMID:15102471 ECO:0000353 UniProtKB:O35305 2023-09-09 GO_Central

this line has '"' in line 7: MGI:MGI:1333854 RO:0002327 GO:0005515 PMID:23478294 ECO:0000353 "UniProtKB:O35305,UniProtKB:P24604,UniProtKB:P35991,UniProtKB:Q78T81,UniProtKB:Q8CIH5"

is the '"' surrounding the multiple UniProtKB terms expected? or is this something you want to fix?

definitely I want to fix this! :) thank you for finding it.

@sierra-moxon
Copy link
Member

@sierra-moxon

Looks like the ones in the 1.2 format, which I am skipping, are the NOCUTA ones. I think you have mentioned this earlier. I am skipping them because I changed my code to process version 2.0, not 1.2.

example: Shh GO:0000122

gotcha - I am updating.

@leemdi
Copy link

leemdi commented Nov 10, 2023

@sierra-moxon
I have found some obsolete GO terms in the gpad. Is there any way to use a new MGI/Lori file?

133 annotations

examples:

GO:0000083 MGI:101934 J:164563 ISO UniProtKB:Q14186 GO_MGI 2023-03-22 MGI go_qualifier_id&=&RO:0002331&==&go_qualifier_term&=&involved_in&==&evidence&=&ECO:0000266

GO:0090305 MGI:102779 J:164563 ISO UniProtKB:P39748 GO_MGI 2016-09-14 MGI go_qualifier_id&=&RO:0002331&==&go_qualifier_term&=&involved_in&==&evidence&=&ECO:0000266

@sierra-moxon
Copy link
Member

absolutely! I will rerun with new files.

@leemdi
Copy link

leemdi commented Nov 10, 2023

@sierra-moxon
I am also finding quotes in gpad/field 11 (property).

@leemdi
Copy link

leemdi commented Nov 10, 2023

@sierra-moxon @ukemi

in field 11/properties

I am used to seeing things like this:
occurs_in(CL:0000622),occurs_in(EMAPA:18537),has_input(PR:Q01341)

now I am seeing this: 2018-07-23 GO_Central RO:0002233(RNAcentral:URS000075DA6B_10090),RO:0002233(RNAcentral:URS000075A5B2_10090),RO:0002233(RNAcentral:URS000075E1B6_10090),BFO:0000066(UBERON:0002107)

David, so, I assume that I need to map the RO or BFO to terms? But not sure what to do with the rest of the info in this new Property.

David: I have found a couple of duplicates in MGI/GO Property vocabulary.

occurs_at | BFO:0000066
occurs_in | BFO:0000066

happens_during | RO:0002092
during | RO:0002092

has_input | RO:0002233
results_in_division_of | RO:0002233
has_regulation_target | RO:0002233
has_direct_input | RO:0002233

has_target_end_location | RO:0002339
has_end_location | RO:0002339

exists_during | RO:0002491
existence_starts_and_ends_during | RO:0002491

@leemdi
Copy link

leemdi commented Nov 13, 2023

@sierra-moxon
in the mgi.gpad (lori file), is see that in field 11, so of the delimiters are "," and some are "|".
is there a way to use the same delimiter? either "," or '|", but not a mix?
This would make it easier for me on our end.

MGI:MGI:108212 RO:0002331 GO:0042327 PMID:21052097 ECO:0000315 2014-07-25 GO_Central RO:0002092(GO:0060546)|RO:0002233(UniProtKB:Q63844)

MGI:MGI:108212 RO:0002331 GO:0070301 PMID:27258785 ECO:0000315 2018-11-27 GO_Central "BFO:0000066(CL:0000746),BFO:0000050(GO:0070301)"

@LiNiMGI
Copy link
Contributor

LiNiMGI commented Feb 27, 2024

@sierra-moxon In MGI, for the human to mouse and rat to mouse GO annotations we use the date of data loaded via orthology.

@sierra-moxon
Copy link
Member

@LiNiMGI - the original annotation in GOA for the first IKR example is:

UniProtKB	P60154	Rnase9	NOT|enables	GO:0004540	PMID:15676279	IKR		F	Inactive ribonuclease-like protein 9	Rnase9	protein	taxon:10090	20150710	GO_Central		

it has an assigned_by of GO_Central ... there was a requirement for this ingest to ignore annotations in protein to GO that are from "MGI", "GO_Central", or "GOC".

I think my script is behaving as designed, so I guess the next question is: if GO_Central provides this, we need to figure out where? (maybe it's in a different form or to a different ID or included in some other ingest? - asking around on my side - do you have any insight?)

@kltm
Copy link
Member Author

kltm commented Feb 27, 2024

How many annotations are we talking here?

@LiNiMGI
Copy link
Contributor

LiNiMGI commented Feb 27, 2024

only 5 annotations, just wondering why there were not in the file...
@sierra-moxon Thanks for explain that to me, yes, your script is behaving as designed. I will talk to David tomorrow, maybe we just skip those?
Thanks,
Li

@sierra-moxon
Copy link
Member

@kltm @LiNiMGI - easy enough to skip. The IKR issue that worries me is that in the GOA annotation records the provider as "GO_Central", however, the final GPAD and GAF produced by this test pipeline are missing these annotations.

Tracing the annotation, it seems like it's something like this:
GO_Central (?in which GO internal pipeline or annotation step is this originally generated?) -> Protein2GO -> GoPreprocess (skip it) -> GO Pipeline -> final GPAD/GAF (missing)

So where is the original location of this annotation? (e.g. noctua, some sort of external ingest to noctua, etc.)

@LiNiMGI
Copy link
Contributor

LiNiMGI commented Feb 27, 2024

@sierra-moxon
I was looking into the GOA_mouse file, I suspect those are GO annotations made in Protein2GO by MGI curator. there are total 58 annotations with EXP evidence code (IDA, IMP, IPI, IKR)...will confirm and report back.

@ukemi
Copy link
Contributor

ukemi commented Feb 28, 2024

Here are some isoform annotations from the mouse Noctua GPAD:

PR Q60636-2 located_in GO:0005737 PMID:18845144 ECO:0000314 20101013 MGI contributor=https://orcid.org/0000-0003-2689-5511|model-state=production|noctua-model-id=gomodel:MGI_MGI_99655
PR Q60636-1 located_in GO:0005634 PMID:18845144 ECO:0000314 20101013 MGI contributor=https://orcid.org/0000-0003-2689-5511|model-state=production|noctua-model-id=gomodel:MGI_MGI_99655
PR Q60636-1 enables GO:0042826 PMID:18845144 ECO:0000314 20101013 MGI contributor=https://orcid.org/0000-0003-2689-5511|model-state=production|noctua-model-id=gomodel:MGI_MGI_99655
PR Q60636-1 enables GO:0043565 PMID:18845144 ECO:0000314 20101013 MGI contributor=https://orcid.org/0000-0003-2689-5511|model-state=production|noctua-model-id=gomodel:MGI_MGI_99655
PR Q60636-2 NOT|acts_upstream_of_or_within GO:0045579 PMID:18845144 ECO:0000314 20101013 MGI contributor=https://orcid.org/0000-0003-2689-5511|model-state=production|noctua-model-id=gomodel:MGI_MGI_99655
PR Q60636-2 enables GO:0005515 PMID:18845144 ECO:0000353 PR:Q60636-1 20120215 MGI has_input(PR:Q60636-1) contributor=https://orcid.org/0000-0003-2689-5511|model-state=production|noctua-model-id=gomodel:MGI_MGI_99655
PR Q60636-1 enables GO:0005515 PMID:18845144 ECO:0000353 PR:Q60636-2 20120215 MGI has_input(PR:Q60636-2) contributor=https://orcid.org/0000-0003-2689-5511|model-state=production|noctua-model-id=gomodel:MGI_MGI_99655
PR Q60636-1 acts_upstream_of_or_within GO:0030889 PMID:18845144 ECO:0000314 20101013 MGI contributor=https://orcid.org/0000-0003-2689-5511|model-state=production|noctua-model-id=gomodel:MGI_MGI_99655
PR Q60636-2 acts_upstream_of_or_within GO:0031665 PMID:18845144 ECO:0000314 20101013 MGI contributor=https://orcid.org/0000-0003-2689-5511|model-state=production|noctua-model-id=gomodel:MGI_MGI_99655
PR Q60636-2 enables GO:0005515 PMID:18845144 ECO:0000353 PR:Q60636-2 20120215 MGI has_input(PR:Q60636-2) contributor=https://orcid.org/0000-0003-2689-5511|model-state=production|noctua-model-id=gomodel:MGI_MGI_99655
PR Q60636-1 acts_upstream_of_or_within GO:0045579 PMID:18845144 ECO:0000314 20101013 MGI contributor=https://orcid.org/0000-0003-2689-5511|model-state=production|noctua-model-id=gomodel:MGI_MGI_99655
PR 000045550 enables GO:0005515 PMID:23326474 ECO:0000353 PR:P62259 20160817 MGI has_input(PR:P62259) contributor=https://orcid.org/0000-0003-2689-5511|model-state=production|noctua-model-id=gomodel:MGI_MGI_99701
PR 000045550 located_in GO:0005737 PMID:23326474 ECO:0000314 20160817 MGI part_of(CL:0000023) contributor=https://orcid.org/0000-0003-2689-5511|model-state=production|noctua-model-id=gomodel:MGI_MGI_99701
PR Q07813-1 acts_upstream_of_or_within GO:2001241 PMID:8358790 ECO:0000314 20130529 MGI contributor=https://orcid.org/0000-0001-5501-853X|noctua-model-id=gomodel:MGI_MGI_99702|model-state=production
PR P18581-2 enables GO:0097627 PMID:8195186 ECO:0000314 20140829 MGI contributor=https://orcid.org/0000-0001-7476-6306|contributor=https://orcid.org/0000-0001-5501-853X|noctua-model-id=gomodel:MGI_MGI_99828|model-state=production

@LiNiMGI
Copy link
Contributor

LiNiMGI commented Feb 28, 2024

Some relations in the Noctua GPAD are not being converted from labels to text correctly geneontology/go-annotation#5092
geneontology/go-annotation#5092

@leemdi
Copy link

leemdi commented Mar 8, 2024

will the new mgi.gpad, mgi.gaf files still be here:

https://current.geneontology.org/annotations/index.html

@kltm
Copy link
Member Author

kltm commented Mar 8, 2024

Once we are completely done done the project down the line, the files will be appearing at that location. (The test files are, naturally, elsewhere, in the interim.)

@pgaudet
Copy link
Contributor

pgaudet commented Mar 13, 2024

Some relations in the Noctua GPAD are not being converted from labels to text correctly geneontology/go-annotation#5092
geneontology/go-annotation#5092

This is not blocking for the current project (MGI remainders)

@leemdi
Copy link

leemdi commented Mar 13, 2024

@kltm
per David, should we use "official" file to load into MGI or use a "snapshot" file?
if so, what will the URL of the snapshot file be?

official:
https://current.geneontology.org/annotations/index.html

snapshot:
??

@kltm
Copy link
Member Author

kltm commented Mar 13, 2024

@leemdi Both pipelines are in a bit of a state right now, but we have
snapshot https://snapshot.geneontology.org/annotations/ (which is "unchecked" (less eyes on it) and ephemeral)
and
current https://current.geneontology.org/annotations/ (the latest release, which has versioning, more qc, and is archived)

What you are using for any specific use case will be up to you.

@leemdi
Copy link

leemdi commented Mar 14, 2024

I don't really know what use case I would use to make this decision. From my perspective, I should use "current".
But I don't specifically know what is in snapshot that is not in current, and what the difference would be for our users. I don't have any specific cases in mind. This decision is really up to Li, I guess.

@LiNiMGI @ukemi , what do you suggest that we use going forward? "snapshot" or "current"?
Note that this will also affect, perhaps, what links we provide from our MGI-public-reports page.
Which is why I suggest "current", not "snapshot". But, it's up to you, Li.

@LiNiMGI
Copy link
Contributor

LiNiMGI commented Mar 19, 2024

Thanks!!! @sierra-moxon
I did a quick check:
1.
Noctua metadata GPAD emission issue:
I see noctua duplicates
one line with model ID/contributor/model state and annotation date.
one line without model ID/contributor/model state, but with a date: 2024-03-18
2.
GO_REF:0000096 J:155856 Rat to mouse
Date fixed, assign by still not fixed (change MGI to GO_Central)
GO_REF:0000119. J:164563. Human to mouse
Date fixed, assigned by fixed!
3.
GO_REF:0000033 annotations (PAINT), good!
4.
GOA mouse Isoform file: seems good!
5.
IKR annotation issue: fixed, looks good!

I will do more test later today and tomorrow.
Li

@sierra-moxon
Copy link
Member

Please see: geneontology/gopreprocess#58 for the remaining issues.

Noting that the human and rat orthology loads use the same code that sets the provided_by to "GO_Central" in the preprocessing pipeline. And, when I look at the GAF file for the human and rat outputs of the preprocessing pipeline here: http://skyhook.berkeleybop.org/silver-issue-325-gopreprocess/products/upstream_and_raw_data/preprocessed_GAF_output/mgi-human-ortho.gaf and http://skyhook.berkeleybop.org/silver-issue-325-gopreprocess/products/upstream_and_raw_data/preprocessed_GAF_output/mgi-rgd-ortho.gaf

they both show ONLY GO_Central as the provider (as expected).

@leemdi
Copy link

leemdi commented Mar 19, 2024

here's an example:

one row has "MGI", one row as "GO_Central".
looks like a duplicate.

MGI:MGI:99961 RO:0002327 GO:0003700 GO_REF:0000096 ECO:0000266 RGD:620975 2024-03-18 MGI
MGI:MGI:99961 RO:0002327 GO:0003700 GO_REF:0000096 ECO:0000266 RGD:620975 2024-03-18 GO_Central

@sierra-moxon
Copy link
Member

sierra-moxon commented Mar 19, 2024

right - I am tracking here: geneontology/gopreprocess#58
It looks like we get the same two annotations from the rat ortho load and from the protein to GO load (but the protein to GO load has the requirement to keep the provided_by the same, and not swap in GO_Central as we do for the ortho loads)
@LiNiMGI - please see this ticket to let me know what to do - the dups are coming in from the loosened constraint implemented for the IKR annotations.

@sierra-moxon
Copy link
Member

sierra-moxon commented Mar 19, 2024

noctua issue tracking here: geneontology/gopreprocess#59

When I run the validate.py produce command in ontobio locally, I do not see the duplicate noctua annotations being generated in the GPAD file:

SMoxon@SMoxon-M82 mgi % grep "MGI:MGI:2685011" mgi.gpad | grep "GO:0009653" | grep "PMID:26258302" mgi.gpad
MGI:MGI:2685011		RO:0002331	GO:0098609	PMID:26258302	ECO:0000315	MGI:MGI:4867020		2016-07-29	MGI	BFO:0000050(GO:0003183),BFO:0000050(GO:0007389),BFO:0000050(GO:0009653),BFO:0000050(GO:0016477),BFO:0000066(UBERON:0007151)	contributor=https://orcid.org/0000-0001-7476-6306|model-state=production|noctua-model-id=gomodel:56aac7ad00000038
MGI:MGI:2685011		RO:0002331	GO:0007389	PMID:26258302	ECO:0000315	MGI:MGI:4867020		2016-07-29	MGI	BFO:0000050(GO:0003183),BFO:0000050(GO:0009653),BFO:0000066(UBERON:0007151)	contributor=https://orcid.org/0000-0001-7476-6306|model-state=production|noctua-model-id=gomodel:56aac7ad00000038
MGI:MGI:2685011		RO:0002331	GO:0003183	PMID:26258302	ECO:0000315	MGI:MGI:4867020		2016-07-29	MGI		contributor=https://orcid.org/0000-0001-7476-6306|model-state=production|noctua-model-id=gomodel:56aac7ad00000038
MGI:MGI:2685011		RO:0002331	GO:0072659	PMID:26258302	ECO:0000315	MGI:MGI:4867020		2016-07-29	MGI	BFO:0000066(UBERON:0007151),RO:0002233(MGI:MGI:88355)	contributor=https://orcid.org/0000-0001-7476-6306|model-state=production|noctua-model-id=gomodel:56aac7ad00000038
MGI:MGI:2685011		RO:0002331	GO:0009653	PMID:26258302	ECO:0000315	MGI:MGI:4867020		2016-07-29	MGI	BFO:0000050(GO:0003183),RO:0002298(UBERON:0007151)	contributor=https://orcid.org/0000-0001-7476-6306|model-state=production|noctua-model-id=gomodel:56aac7ad00000038
MGI:MGI:2685011		RO:0002331	GO:0016477	PMID:26258302	ECO:0000315	MGI:MGI:4867020		2016-07-29	MGI	BFO:0000050(GO:0003183),BFO:0000050(GO:0007389),BFO:0000050(GO:0009653),BFO:0000066(UBERON:0007151)	contributor=https://orcid.org/0000-0001-7476-6306|model-state=production|noctua-model-id=gomodel:56aac7ad00000038
MGI:MGI:2685011		RO:0002264	GO:0098609	PMID:26258302	ECO:0000315	MGI:MGI:4867020		2016-02-11	MGI	BFO:0000066(EMAPA:18628)	contributor=https://orcid.org/0000-0001-7476-6306|model-state=production|noctua-model-id=gomodel:MGI_MGI_2685011
MGI:MGI:2685011		RO:0002264	GO:0016477	PMID:26258302	ECO:0000315	MGI:MGI:4867020		2016-02-11	MGI	BFO:0000066(EMAPA:18628)	contributor=https://orcid.org/0000-0001-7476-6306|model-state=production|noctua-model-id=gomodel:MGI_MGI_2685011
MGI:MGI:2685011		RO:0002264	GO:0007389	PMID:26258302	ECO:0000315	MGI:MGI:4867020		2016-02-11	MGI	BFO:0000066(EMAPA:18628)	contributor=https://orcid.org/0000-0001-7476-6306|model-state=production|noctua-model-id=gomodel:MGI_MGI_2685011
MGI:MGI:2685011		RO:0002264	GO:0003183	PMID:26258302	ECO:0000315	MGI:MGI:4867020		2016-02-11	MGI		contributor=https://orcid.org/0000-0001-7476-6306|model-state=production|noctua-model-id=gomodel:MGI_MGI_2685011
MGI:MGI:2685011		RO:0002264	GO:0072659	PMID:26258302	ECO:0000315	MGI:MGI:4867020		2016-02-11	MGI	BFO:0000066(EMAPA:18628)	contributor=https://orcid.org/0000-0001-7476-6306|model-state=production|noctua-model-id=gomodel:MGI_MGI_2685011

there must be another step in the pipeline...

@sierra-moxon
Copy link
Member

Next round of files are out for review: http://skyhook.berkeleybop.org/full-issue-325-gopreprocess/annotations/mgi.gpad.gz

@leemdi
Copy link

leemdi commented Mar 20, 2024

thanks, Sierra. Processed on MGi-Scrum, and sent to Li for review

@LiNiMGI
Copy link
Contributor

LiNiMGI commented Mar 20, 2024

@sierra-moxon

Annotation date for the below annotations should not be changed to the loading date, we should keep the original annotation date:
MGI:MGI:101757 | RO:0001025 | GO:0005829 | Reactome:R-MMU-482767 | ECO:0000304 | 3/19/24 | Reactome |   |   |  
MGI:MGI:101757 | RO:0001025 | GO:0030027 | PMID:25107909 | ECO:0000314 | 3/19/24 | UniProt |   |   |  
MGI:MGI:101757 | RO:0002327 | GO:0005102 | PMID:24052308 | ECO:0000353 | UniProtKB:P97484 | 3/19/24 | ARUK-UCL | BFO:0000066(UBERON:0001890)
MGI:MGI:101757 | RO:0002331 | GO:0030836 | PMID:23921380 | ECO:0000314 | 3/19/24 | CACAO |   |   |  
MGI:MGI:101781 | RO:0002327 | GO:0005515 | PMID:16533813 | ECO:0000353 | UniProtKB:P25100 | 3/19/24 | IntAct
MGI:MGI:101782 BFO:0000050 GO:0034706 GO_REF:0000114 ECO:0000266 ComplexPortal:CPX-314 3/19/24 ComplexPortal

I think these are all from the GOA mouse file.

we should only change the date of rat to mouse/human to mouse.

@sierra-moxon
Copy link
Member

sierra-moxon commented Mar 20, 2024

tracking date change issue for protein to GO files here: geneontology/gopreprocess#58

@sierra-moxon
Copy link
Member

noting that the provided_by and noctua duplicates issues have been resolved here: geneontology/gopreprocess#53 and geneontology/gopreprocess#59 respectively

@pgaudet
Copy link
Contributor

pgaudet commented Apr 12, 2024

@sierra-moxon If all tasks here are done or moved, can we then close this ticket?

@sierra-moxon
Copy link
Member

yes, I think so.

@github-project-automation github-project-automation bot moved this from Ready to be injected into product (MGI GPAD) to Done in Integrate remainder of MGI pipeline into the GO pipeline Apr 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

9 participants