GitHub versus specialized platform (database) #33
Replies: 15 comments
-
@Midnighter thanks for bringing up this important topic. IMO, one of the major advantages of using GitHub for hosting GEMs is the transparent and well-documented curation process, which is essential for long-term evolvement of the field. |
Beta Was this translation helpful? Give feedback.
-
Excellent overview @Midnighter. To add to it, to me it's more about Although model changes can be done manually, in many cases they are done via scripts. I'm unsure which platforms allow code sharing together with model changes. By keeping these together we increase reproducibility.
Some of the cons of GitHub can be addressed via |
Beta Was this translation helpful? Give feedback.
-
@mihai-sysbio very good point. Although |
Beta Was this translation helpful? Give feedback.
-
Yes, I agree. This a very important feature. (And one that @zakandrewking had approached beautifully.) |
Beta Was this translation helpful? Give feedback.
-
I just want to clarify questions about access limits in KBase.
So KBase is completely open and free for any user to sign up to run the tools, contribute data, and retrieve data. There are no restrictions on that, and I don’t see that ever changing.
There are light restrictions on who is allowed to contribute code via our SDK mandated upon us by DOE. DOE is somewhat finicky about who is allowed to contribute code that runs on DOE machines. Basically, this process involves filling out a form (accounts.kbase.us) to get a developer account on KBase.
All that said, while I would love to see KBase serve as a model atlas, I also see the advantages of a github based system (we use github for our ModelSEED biochemistry database). KBase does do versioning on all objects in its data store, but it doesn’t currently offer the rich tooling on tracking contributions and doing diffs that github does. There’s also nothing stopping the platforms like KBase or PathwayTools from linking deeply to a guthub resource. I know I would be interested in adding apps in KBase to automatically import from such a site if it was created.
One thing I would consider to be of utmost importance in such a site is to properly represent the genomes linked to the models. Ideally, I would prefer the see the site maintain its own internal compressed copies of GFF and FASTA files for genomes associated with any models stored there. People routinely use genome IDs… but these IDs go away or genes get recalled and it makes things difficult. I would argue a model is nearly useless without its associated genome, and finding the exact correct genome that should be mapped to a particular published model is one of my greatest pain points in trying to use these models in my own research. You could store protein sequences in the model, which would help, but without the genome, you’re still losing some provenance on where the protein came from.
edit: removed the email body.
|
Beta Was this translation helpful? Give feedback.
-
One thing to keep in mind is that My personal experience of reading and reviewing papers is that there are many models that are not generated by any of the platforms mentioned above, but rather by COBRA, cobrapy, RAVEN, etc., using custom scripts. This is particularly the case for curation of existing models. These models now often only distributed as final SBML file in Supplementary Material (and perhaps submitted to BioModels Database). Some of them are already on GitHub, but the format and content of these repositories varies widely. Regardless, an important aspect of We currently have/work on functions to write the correct files in the right format for COBRA/RAVEN/cobrapy, but maybe this can be expanded by having such functionality for the other platforms as well. |
Beta Was this translation helpful? Give feedback.
-
I would love to hear more about the philosophy behind a "GitHub for GEMs" @zakandrewking, and how that improves upon the philosophy behind eg MEMOsys. Reading between the lines, I see some consensus between:
From this perspective, |
Beta Was this translation helpful? Give feedback.
-
Just one question about line-based diff tools such as git: many standardized file formats are based on XML, which does not require a fixed order of its contained elements. For constraint-based modeling, SBML has become most effective with the package FBC (flux-balance constraints), which is only directly supported since Level 3 Version 1. While earlier SBML specifications pointed out that the order of its elements is significant, the more recent specifications (since L3V1) explicitly mention that this is no longer the case. Consequently, line-based diff tools such as git might not be able to identify and track changes if users scramble up the order of model elements. How could a "GitHub for GEMs"-approach deal with that problem? |
Beta Was this translation helpful? Give feedback.
-
I would think you should create api scripts to process and check formats... which could handle sorting to make the files more diff-able. These scripts could also handle validation. Running these scripts should be a prerequisite to getting a PR accepted with modifications to a model. Running tools like memote could be bundled in to do qa/qc. This is the direction we’ve gone with ModelSEEDDatabase, which is on GitHub.
Get Outlook for iOS<https://aka.ms/o0ukef>
…________________________________
From: Andreas Dräger <[email protected]>
Sent: Friday, August 14, 2020 5:22:11 PM
To: MetabolicAtlas/standard-GEM <[email protected]>
Cc: cshenry <[email protected]>; Mention <[email protected]>
Subject: Re: [MetabolicAtlas/standard-GEM] GitHub versus specialized platform (database) (#15)
Just one question about line-based diff tools such as git: many standardized file formats are based on XML, which does not require a fixed order of its contained elements. For constraint-based modeling, SBML has become most effective with the package FBC (flux-balance constraints), which is only directly supported since Level 3 Version 1. While earlier SBML specifications pointed out that the order of its elements is significant, the more recent specifications (since L3V1) explicitly mention that this is no longer the case. Consequently, line-based diff tools such as git might not be able to identify and track changes if users scramble up the order of model elements. How could a "GitHub for GEMs"-approach deal with that problem?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#15 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAHV6IT4ZRDQ53NYNDLTZPTSAW2JHANCNFSM4P6GVPRQ>.
|
Beta Was this translation helpful? Give feedback.
-
@draeger the ordering of elements in SBML (L3V1+) is something we indeed need to be mindful of. As @cshenry points out, as soon as there is a standard for repositories, one can create all sorts of systems for validation, eg the automated-validation branch of this repository. Alternatively, standard-GEM could provide workflow scrips for GitHub Actions that do this validation inside each repostory, if that would be interesting. |
Beta Was this translation helpful? Give feedback.
-
@draeger @mihai-sysbio you could rely on memote's approach on using YAML files (generated in addition to the SBML files) to facilitate easier line-based diff? |
Beta Was this translation helpful? Give feedback.
-
I support using YAML files for easier diff. I guess the yaml-file can be created by a pre-commit hook that also sorts elements and annotations. |
Beta Was this translation helpful? Give feedback.
-
Personally, I also think that SBtab has a lot of potential. It is supposed to be compatible to SBML but provides a view to the models suitable for exchange via spreadsheet programs such as Excel. For model development, the row-based SBtab also has the advantage that it can be directly understood and read (not only by machines but also by users) and most people who work in the lab are familiar with Excel. Changes in such a format could also be understood by line-based comparison tools such as Git. SBtab could be used for model development and be exported to SBML for analysis. |
Beta Was this translation helpful? Give feedback.
-
I think so, too, there is a lot of potential for SBtab and ObjTables (which I understand as the spiritual successor to SBtab). Indeed Wolfram Liebermeister asked whether direct support for SBtab could be added to cobrapy. |
Beta Was this translation helpful? Give feedback.
-
This is a very valuable discussion to have. At the moment, however, I feel it is hard to formulate action points, so I am going to convert this to a Discussion. When actionable items arise, issues can be created from the discussion. |
Beta Was this translation helpful? Give feedback.
-
This is a big discussion to have. There are other approaches to standardization and it is important to clearly lay out the pros and cons.
Platforms
Pros
I think such platforms have a lot to offer.
Probably the platform maintainers can come up with more good reasons.
Cons
We are pouring a lot of public money into building metabolic models and maintaining the tools to work with them. As such, these platforms need to fulfil a number of criteria in my opinion:
vs GitHub
Pros
Cons
These are just some preliminary thoughts. Happy to hear your thoughts and points of view. Either way, I think more effort is needed in this area and I'm glad that you're rising to the challenge.
Beta Was this translation helpful? Give feedback.
All reactions