Documentation of model development #20

sulheim · 2020-08-27T19:14:58Z

Description of the issue:

It is not clear to me if this is outside the scope of standard-GEM, but I think it would be useful to come up with a language-agnostic guideline / template for how to document the development-process so it is easy for anyone to understand what and how the model reconstruction is performed, and how one can reproduce the current state of the model. One common practice (that I've used) is to have a script which performs the complete model reconstruction from any given starting point. This works reasonably well, but it still not trivial for someone else to trace the reconstruction unless the code is very well documented.

What do think is the best practice that should be recommended to users of standard-GEM?

mihai-sysbio · 2020-09-03T06:00:20Z

What an interesting question! A git-based workflow allows for versioning of code and model. For the model, there will be an input model (prev commit), and an output model (new commit). It sounds like a great idea to follow the approach described above. I'm not sure what would be easy enough though, but I feel it ought to involve some way of glueing together the models and the code.

Midnighter · 2020-09-03T07:55:06Z

If you have the energy to guide and maintain it, I think a public gitbook could be a great place for such a guide. Thus it can be continuously updated from the community. It takes effort to steer such an effort and maintain a comprehensible whole, though.

haowang-bioinfo · 2020-09-29T08:52:26Z

Documentation of model curation is essential in GEM development. A well-defined Git-based workflow would help in achieving this goal, therefore should be within the scope of standard-GEM.

sulheim · 2020-10-26T17:27:46Z

@Midnighter I am not familiar with gitbook, but from a brief look it seems like it might be too much work and something that is not neccessarily maintained along with the model on github. Maybe a more realistic option is to create templates for model reconstruction scripts (e.g. in MATLAB or python) that ensures a minimum of documentation along with the reconstruction.

draeger · 2020-10-26T18:28:11Z

Along those lines we could start thinking about a minimum information requirement that should be reported about the steps taken to create a GEM. Such guidelines exist already for various other aspects of science, in systems biology MIRIAM is a prominent example but there are plenty of others. Of course, there is Ines Thiele's famous protocol for generating a high-quality GEM, but we could start collecting key points what needs to go into such a documentation that @sulheim requests.

Midnighter · 2020-10-26T18:58:04Z

@Midnighter I am not familiar with gitbook, but from a brief look it seems like it might be too much work and something that is not neccessarily maintained along with the model on github. Maybe a more realistic option is to create templates for model reconstruction scripts (e.g. in MATLAB or python) that ensures a minimum of documentation along with the reconstruction.

I agree, gitbook was my recommendation for a more meta guide on how to construct models general, not to serve as documentation alongside one specific model.

sulheim · 2021-01-27T20:04:18Z

In this context, I would like to discuss how one should organize model reconstruction and curation scripts as well as model files.
We are currently curating and re-organizing the Sco-GEM model folder (to adher to Standard-GEM template), see SysBioChalmers/Sco-GEM#122.

We have encountered an issue where it is rather inconvenient to test / update curation scripts that has been used previously to update the model as the model file in the repository always is the latest version. E.g. if you have previously written and applied a script that is deleting a few model reactions, and you want to modify and rerun that script, you cannot test that script on the model file in the repository. One solution is to keep an archive folder with previous model versions, but believe there might be more clever solutions to this issue.

What do you think?

edkerk · 2021-01-27T21:08:11Z

But would an archive model folder not sort of defeat the purpose of git? Meanwhile, older releases can relatively easily be extracted from the local repository with e.g.

git show refs/tags/v1.4.2:model/standard-GEM.xml > model_v1_4_2.xml

or latest master version

git show master:model/standard-GEM.xml > model_master.xml

JonathanRob · 2021-01-28T07:40:51Z

Somewhat related to this, we are implementing an approach with Human-GEM to deal with old curation/reconstruction scripts that do not work with the current model version by moving them to a deprecated folder (I also like @sulheim's archive suggestion as a name). This would separate these scripts from those that are currently maintained, so there is no expectation that they should function as expected.

If one wanted to run an archived script, then they can checkout the commit when the script was last modified or used, when presumably the corresponding model version at the time of the commit would be compatible with that script.

sulheim · 2021-01-28T20:29:50Z

I don't think an archive folder defeats the purpose of git (although I see your point @edkerk), I think git is much more than just the access to previous model versions through the log. However, your suggestion of just reading the model file from the master branch seems pretty elegant. I still think that @JonathanRob has a good point, however these two solutions are not exclusive. This is basically what the same as we have done with the sulheim2020 folder in the Sco-GEM repo.

haowang-bioinfo · 2021-01-28T22:08:31Z

@sulheim, a Yaml-based workflow implemented in Human-GEM may provide another option for curating GEMs.

Previously, we also used scripts for adding/removing reactions and making changes to model. As @JonathanRob mentioned, now we are archiving the old code and retiring the script-based approach. In the new workflow, only a Yaml format model file is retained in develop and other fix/feature branches. Given the human-readable feature, changes made to Yaml file are evident and clear enough so that script-independent curation is allowed.

For example, in the PR #213 a number of duplicated metabolites and reactions were removed by a series of commits, each of which resolves one duplicated met. In particular, the metabolite malthx_s and associated reaction EX_M02447[e] were deleted in this commit where the annotation files were also updated. With this work flow, the changes can be made either by code or manually, and conveniently reviewed afterwards. A couple of assisting code (testYamlConversion, sanityCheck) were provided as check points before and after making PR to avoid mistakes.

This workflow is still under development and refinement. But it seems that this works pretty well so far.

sulheim · 2021-02-09T13:11:36Z

That's an interesting workflow @Hao-Chalmers . Altough I understand that one can reproduce the model development by going through each individual commit and redo the manual curations, it sounds less tidy than having all edits documented in a script. I think it makes sense to only have the yaml-format in the devel folder though (but is that compatible with the COBRA toolbox in Matlab?).

haowang-bioinfo · 2021-02-13T09:13:01Z

@sulheim The yaml-format actually was adapted from Cobrapy. So you'd have built-in support under Python environment when using COBRA.

Yes, you can only keep the yaml file in devel and feat/fix branches for tracking model changes, then the scripts probably are not necessary (but they still can be kept, such as in an archive folder for reference).

mihai-sysbio · 2021-08-06T08:44:20Z

Reading this issue again, I find it an interesting discussion to continue. However, it would be very ambitious to construct a roadmap that would enable full reproducibility, including all the curation. Therefore, I propose to add the use of the deprecated folder to .standard-GEM.md and then convert this issue into a GitHub Discussion.

sulheim · 2021-08-10T08:42:57Z

Ok. But isn't archive a better name than deprecated?

mihai-sysbio · 2021-08-10T08:50:01Z

Ok. But isn't archive a better name than deprecated?

The message I am trying to send is attuned to the definitions as in the Merriam-Webster dictionary:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation of model development #20

Documentation of model development #20

sulheim commented Aug 27, 2020

mihai-sysbio commented Sep 3, 2020

Midnighter commented Sep 3, 2020

haowang-bioinfo commented Sep 29, 2020

sulheim commented Oct 26, 2020

draeger commented Oct 26, 2020

Midnighter commented Oct 26, 2020

sulheim commented Jan 27, 2021

edkerk commented Jan 27, 2021 •

edited

Loading

JonathanRob commented Jan 28, 2021

sulheim commented Jan 28, 2021

haowang-bioinfo commented Jan 28, 2021

sulheim commented Feb 9, 2021

haowang-bioinfo commented Feb 13, 2021 •

edited

Loading

mihai-sysbio commented Aug 6, 2021

sulheim commented Aug 10, 2021

mihai-sysbio commented Aug 10, 2021

Documentation of model development #20

Documentation of model development #20

Comments

sulheim commented Aug 27, 2020

Description of the issue:

mihai-sysbio commented Sep 3, 2020

Midnighter commented Sep 3, 2020

haowang-bioinfo commented Sep 29, 2020

sulheim commented Oct 26, 2020

draeger commented Oct 26, 2020

Midnighter commented Oct 26, 2020

sulheim commented Jan 27, 2021

edkerk commented Jan 27, 2021 • edited Loading

JonathanRob commented Jan 28, 2021

sulheim commented Jan 28, 2021

haowang-bioinfo commented Jan 28, 2021

sulheim commented Feb 9, 2021

haowang-bioinfo commented Feb 13, 2021 • edited Loading

mihai-sysbio commented Aug 6, 2021

sulheim commented Aug 10, 2021

mihai-sysbio commented Aug 10, 2021

edkerk commented Jan 27, 2021 •

edited

Loading

haowang-bioinfo commented Feb 13, 2021 •

edited

Loading