Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation of model development #20

Open
sulheim opened this issue Aug 27, 2020 · 16 comments
Open

Documentation of model development #20

sulheim opened this issue Aug 27, 2020 · 16 comments
Labels
workflow workflow for curating GEM-type repos

Comments

@sulheim
Copy link

sulheim commented Aug 27, 2020

Description of the issue:

It is not clear to me if this is outside the scope of standard-GEM, but I think it would be useful to come up with a language-agnostic guideline / template for how to document the development-process so it is easy for anyone to understand what and how the model reconstruction is performed, and how one can reproduce the current state of the model. One common practice (that I've used) is to have a script which performs the complete model reconstruction from any given starting point. This works reasonably well, but it still not trivial for someone else to trace the reconstruction unless the code is very well documented.

What do think is the best practice that should be recommended to users of standard-GEM?

@mihai-sysbio
Copy link
Member

What an interesting question! A git-based workflow allows for versioning of code and model. For the model, there will be an input model (prev commit), and an output model (new commit). It sounds like a great idea to follow the approach described above. I'm not sure what would be easy enough though, but I feel it ought to involve some way of glueing together the models and the code.

@Midnighter
Copy link
Collaborator

If you have the energy to guide and maintain it, I think a public gitbook could be a great place for such a guide. Thus it can be continuously updated from the community. It takes effort to steer such an effort and maintain a comprehensible whole, though.

@haowang-bioinfo haowang-bioinfo added the workflow workflow for curating GEM-type repos label Sep 29, 2020
@haowang-bioinfo
Copy link
Member

Documentation of model curation is essential in GEM development. A well-defined Git-based workflow would help in achieving this goal, therefore should be within the scope of standard-GEM.

@sulheim
Copy link
Author

sulheim commented Oct 26, 2020

@Midnighter I am not familiar with gitbook, but from a brief look it seems like it might be too much work and something that is not neccessarily maintained along with the model on github. Maybe a more realistic option is to create templates for model reconstruction scripts (e.g. in MATLAB or python) that ensures a minimum of documentation along with the reconstruction.

@draeger
Copy link
Collaborator

draeger commented Oct 26, 2020

Along those lines we could start thinking about a minimum information requirement that should be reported about the steps taken to create a GEM. Such guidelines exist already for various other aspects of science, in systems biology MIRIAM is a prominent example but there are plenty of others. Of course, there is Ines Thiele's famous protocol for generating a high-quality GEM, but we could start collecting key points what needs to go into such a documentation that @sulheim requests.

@Midnighter
Copy link
Collaborator

@Midnighter I am not familiar with gitbook, but from a brief look it seems like it might be too much work and something that is not neccessarily maintained along with the model on github. Maybe a more realistic option is to create templates for model reconstruction scripts (e.g. in MATLAB or python) that ensures a minimum of documentation along with the reconstruction.

I agree, gitbook was my recommendation for a more meta guide on how to construct models general, not to serve as documentation alongside one specific model.

@sulheim
Copy link
Author

sulheim commented Jan 27, 2021

In this context, I would like to discuss how one should organize model reconstruction and curation scripts as well as model files.
We are currently curating and re-organizing the Sco-GEM model folder (to adher to Standard-GEM template), see SysBioChalmers/Sco-GEM#122.

We have encountered an issue where it is rather inconvenient to test / update curation scripts that has been used previously to update the model as the model file in the repository always is the latest version. E.g. if you have previously written and applied a script that is deleting a few model reactions, and you want to modify and rerun that script, you cannot test that script on the model file in the repository. One solution is to keep an archive folder with previous model versions, but believe there might be more clever solutions to this issue.

What do you think?

@edkerk
Copy link
Collaborator

edkerk commented Jan 27, 2021

But would an archive model folder not sort of defeat the purpose of git? Meanwhile, older releases can relatively easily be extracted from the local repository with e.g.

git show refs/tags/v1.4.2:model/standard-GEM.xml > model_v1_4_2.xml

or latest master version

git show master:model/standard-GEM.xml > model_master.xml

@JonathanRob
Copy link
Member

Somewhat related to this, we are implementing an approach with Human-GEM to deal with old curation/reconstruction scripts that do not work with the current model version by moving them to a deprecated folder (I also like @sulheim's archive suggestion as a name). This would separate these scripts from those that are currently maintained, so there is no expectation that they should function as expected.

If one wanted to run an archived script, then they can checkout the commit when the script was last modified or used, when presumably the corresponding model version at the time of the commit would be compatible with that script.

@sulheim
Copy link
Author

sulheim commented Jan 28, 2021

I don't think an archive folder defeats the purpose of git (although I see your point @edkerk), I think git is much more than just the access to previous model versions through the log. However, your suggestion of just reading the model file from the master branch seems pretty elegant. I still think that @JonathanRob has a good point, however these two solutions are not exclusive. This is basically what the same as we have done with the sulheim2020 folder in the Sco-GEM repo.

@haowang-bioinfo
Copy link
Member

@sulheim, a Yaml-based workflow implemented in Human-GEM may provide another option for curating GEMs.

Previously, we also used scripts for adding/removing reactions and making changes to model. As @JonathanRob mentioned, now we are archiving the old code and retiring the script-based approach. In the new workflow, only a Yaml format model file is retained in develop and other fix/feature branches. Given the human-readable feature, changes made to Yaml file are evident and clear enough so that script-independent curation is allowed.

For example, in the PR #213 a number of duplicated metabolites and reactions were removed by a series of commits, each of which resolves one duplicated met. In particular, the metabolite malthx_s and associated reaction EX_M02447[e] were deleted in this commit where the annotation files were also updated. With this work flow, the changes can be made either by code or manually, and conveniently reviewed afterwards. A couple of assisting code (testYamlConversion, sanityCheck) were provided as check points before and after making PR to avoid mistakes.

This workflow is still under development and refinement. But it seems that this works pretty well so far.

@sulheim
Copy link
Author

sulheim commented Feb 9, 2021

That's an interesting workflow @Hao-Chalmers . Altough I understand that one can reproduce the model development by going through each individual commit and redo the manual curations, it sounds less tidy than having all edits documented in a script. I think it makes sense to only have the yaml-format in the devel folder though (but is that compatible with the COBRA toolbox in Matlab?).

@haowang-bioinfo
Copy link
Member

haowang-bioinfo commented Feb 13, 2021

@sulheim The yaml-format actually was adapted from Cobrapy. So you'd have built-in support under Python environment when using COBRA.

Yes, you can only keep the yaml file in devel and feat/fix branches for tracking model changes, then the scripts probably are not necessary (but they still can be kept, such as in an archive folder for reference).

@mihai-sysbio
Copy link
Member

Reading this issue again, I find it an interesting discussion to continue. However, it would be very ambitious to construct a roadmap that would enable full reproducibility, including all the curation. Therefore, I propose to add the use of the deprecated folder to .standard-GEM.md and then convert this issue into a GitHub Discussion.

@sulheim
Copy link
Author

sulheim commented Aug 10, 2021

Ok. But isn't archive a better name than deprecated?

@mihai-sysbio
Copy link
Member

Ok. But isn't archive a better name than deprecated?

The message I am trying to send is attuned to the definitions as in the Merriam-Webster dictionary:

archive

a place in which public records or historical materials (such as documents) are preserved

deprecate

to withdraw official support for or discourage the use of (something, such as a software product) in favor of a newer or better alternative

That being said, I would appreciate more thoughts on the matter, especially that I feel more people are familiar with the term archive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
workflow workflow for curating GEM-type repos
Projects
Status: No status
Development

No branches or pull requests

7 participants