You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@cmungall
Based upon discussion during Jan 30th, 2025 Monarch Data call. Discuss where we want to get gene-gene interaction data. Is StringDB meeting our needs, do we want towards more upstream data sources? At a minimum I believe we want to ensure we are pruning out text mined information.
Here are some notes
What files are currently coming from String and what data do they have.
From my notes we are ingesting the following 14 string files.
Looking at the format of "protein.links.detailed" files from StringDB, they have a columns for "experimental", "database", and "textmining", and "combined" scores
How is the combined score calculated by String.
Based upon information discussed here, http://version10.string-db.org/help/faq/, each score column in String has a prior, which is used to weight the score and then sum them.
What processing is currently happening to StringDB ingests
@cmungall
Based upon discussion during Jan 30th, 2025 Monarch Data call. Discuss where we want to get gene-gene interaction data. Is StringDB meeting our needs, do we want towards more upstream data sources? At a minimum I believe we want to ensure we are pruning out text mined information.
Here are some notes
What files are currently coming from String and what data do they have.
From my notes we are ingesting the following 14 string files.
Looking at the format of "protein.links.detailed" files from StringDB, they have a columns for "experimental", "database", and "textmining", and "combined" scores
How is the combined score calculated by String.
Based upon information discussed here, http://version10.string-db.org/help/faq/, each score column in String has a prior, which is used to weight the score and then sum them.
What processing is currently happening to StringDB ingests
We can look at the processing step going on for String here -
https://github.com/monarch-initiative/monarch-ingest/blob/24d9e972b9cbb5263dc6f1f5380afc23cfc32cf3/src/monarch_ingest/ingests/string/protein_links.yaml#L46-L51. Our ingest filters based upon combined score which is based upon text mining. It may be better to enforce only an "experimental" column be greater than 0. Or some other curation methodology.
The text was updated successfully, but these errors were encountered: