-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation of model development #20
Comments
What an interesting question! A git-based workflow allows for versioning of code and model. For the model, there will be an input model (prev commit), and an output model (new commit). It sounds like a great idea to follow the approach described above. I'm not sure what would be easy enough though, but I feel it ought to involve some way of glueing together the models and the code. |
If you have the energy to guide and maintain it, I think a public gitbook could be a great place for such a guide. Thus it can be continuously updated from the community. It takes effort to steer such an effort and maintain a comprehensible whole, though. |
Documentation of model curation is essential in GEM development. A well-defined Git-based workflow would help in achieving this goal, therefore should be within the scope of |
@Midnighter I am not familiar with gitbook, but from a brief look it seems like it might be too much work and something that is not neccessarily maintained along with the model on github. Maybe a more realistic option is to create templates for model reconstruction scripts (e.g. in MATLAB or python) that ensures a minimum of documentation along with the reconstruction. |
Along those lines we could start thinking about a minimum information requirement that should be reported about the steps taken to create a GEM. Such guidelines exist already for various other aspects of science, in systems biology MIRIAM is a prominent example but there are plenty of others. Of course, there is Ines Thiele's famous protocol for generating a high-quality GEM, but we could start collecting key points what needs to go into such a documentation that @sulheim requests. |
I agree, gitbook was my recommendation for a more meta guide on how to construct models general, not to serve as documentation alongside one specific model. |
In this context, I would like to discuss how one should organize model reconstruction and curation scripts as well as model files. We have encountered an issue where it is rather inconvenient to test / update curation scripts that has been used previously to update the model as the model file in the repository always is the latest version. E.g. if you have previously written and applied a script that is deleting a few model reactions, and you want to modify and rerun that script, you cannot test that script on the model file in the repository. One solution is to keep an archive folder with previous model versions, but believe there might be more clever solutions to this issue. What do you think? |
But would an
or latest
|
Somewhat related to this, we are implementing an approach with Human-GEM to deal with old curation/reconstruction scripts that do not work with the current model version by moving them to a If one wanted to run an archived script, then they can checkout the commit when the script was last modified or used, when presumably the corresponding model version at the time of the commit would be compatible with that script. |
I don't think an |
@sulheim, a Yaml-based workflow implemented in Previously, we also used scripts for adding/removing reactions and making changes to model. As @JonathanRob mentioned, now we are archiving the old code and retiring the script-based approach. In the new workflow, only a Yaml format model file is retained in develop and other fix/feature branches. Given the human-readable feature, changes made to Yaml file are evident and clear enough so that script-independent curation is allowed. For example, in the PR #213 a number of duplicated metabolites and reactions were removed by a series of commits, each of which resolves one duplicated met. In particular, the metabolite This workflow is still under development and refinement. But it seems that this works pretty well so far. |
That's an interesting workflow @Hao-Chalmers . Altough I understand that one can reproduce the model development by going through each individual commit and redo the manual curations, it sounds less tidy than having all edits documented in a script. I think it makes sense to only have the yaml-format in the devel folder though (but is that compatible with the COBRA toolbox in Matlab?). |
@sulheim The yaml-format actually was adapted from Cobrapy. So you'd have built-in support under Python environment when using COBRA. Yes, you can only keep the yaml file in |
Reading this issue again, I find it an interesting discussion to continue. However, it would be very ambitious to construct a roadmap that would enable full reproducibility, including all the curation. Therefore, I propose to add the use of the |
Ok. But isn't |
The message I am trying to send is attuned to the definitions as in the Merriam-Webster dictionary:
That being said, I would appreciate more thoughts on the matter, especially that I feel more people are familiar with the term |
Description of the issue:
It is not clear to me if this is outside the scope of standard-GEM, but I think it would be useful to come up with a language-agnostic guideline / template for how to document the development-process so it is easy for anyone to understand what and how the model reconstruction is performed, and how one can reproduce the current state of the model. One common practice (that I've used) is to have a script which performs the complete model reconstruction from any given starting point. This works reasonably well, but it still not trivial for someone else to trace the reconstruction unless the code is very well documented.
What do think is the best practice that should be recommended to users of standard-GEM?
The text was updated successfully, but these errors were encountered: