Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracker for Coverage over CellML Model Repository #23

Open
anandijain opened this issue Mar 21, 2021 · 12 comments
Open

Tracker for Coverage over CellML Model Repository #23

anandijain opened this issue Mar 21, 2021 · 12 comments

Comments

@anandijain
Copy link
Contributor

This issue will track our progress in testing CellMLToolkit.jl on the CellML Model Repository.

I have a branch that I've added some functions to query the Model Repo for all of the "exposures" and then curl them here. Additionally I added some functions to create a DataFrame to see which models work and which don't, here.

This work is incomplete and since the model repository is quite large, it takes a while to download.

We are planning to do something similar for SBML.jl and their test-suite so it'd be nice to have some consistency in testing.

I don't have the entire library, but from my sample of ~1000 models, I found that we can call solve on about 10% of these models and get back a Solution.

@shahriariravanian you've mentioned some of the issues that could be contributing to this 10% number. It would be good to mention them, so that as they get fixed we can see how this percentage changes.

@anandijain
Copy link
Contributor Author

2.1.0 is giving ~178/940

@anandijain
Copy link
Contributor Author

2.2.0 is giving ~477/940

@anandijain
Copy link
Contributor Author

It is a known issue that some files in the CellML Model Repository have bad XML or do not fit the specification of CellML we use. (aside @shahriariravanian which version of CellML are we guaranteeing should work?)

removing Goldbeeter_2006 from my data folder we now get. The problem is caused in EzXML, where if it hits an error in parsing, it pushes to a global error stack that prevents further usage. why they do this, I have no idea...

530/940

@anandijain
Copy link
Contributor Author

861 CellML models
718 successfully converted to ODESystem
635 successfully converted to ODEProblem
595 successfully solved

we get 940 from the curls, but cloning the git repos returns 861, so that's where that discrepancy comes from
595/861 is quite good IMO. as a lot of the models are truly defective

@ChrisRackauckas
Copy link
Member

What are the issues you see?

@anandijain
Copy link
Contributor Author

this data is from @shahriariravanian. could you shed some light on chris' question?

@shahriariravanian
Copy link
Collaborator

The remaining issues are:

  1. Some CellML XML files are defective (missing some initial values). Currently, CellMLToolkit throws an error for these. However, the plan is to return a list of uninitiated variables for the user to provide the values.
  2. Some models have more than one iv (in fact, some use partial_diff tag). This is uncommon in CellML models but is supported in the specs.
  3. The main remaining active tissue is to implement imports completely. Currently, we have an incomplete implementation. Full import is rather complicated, as CellML XML files can recursively import and rename components and connections (links between variables from different components) from other files. Because of the connections, we may need to import some components implicitly.
  4. The ODEProblms which were not solved are not a big problem, as we used a fixed solver (TRBDF2) with some default parameters.
  5. Large models (XML size > 500K) can take a long time to generate an ODESystem. I'm going to profile and see where the main problem is, but we may need to change the strategy in how to use structural_simplfy for the very large models.

@anandijain
Copy link
Contributor Author

Great, could you name a model with ? I'd like to look into that. Similarly for a model with missing vars and components.

Also, if you end up doing some profiling, I think it'd be good to add benchmarking to our testing of the model repo. I'm happy to add this too with BenchmarkTools.

This may help pin down inefficiencies, ie "is it dependent on parameter count, state count, etc... ?".

@shahriariravanian
Copy link
Collaborator

This is the results of the latest run:

# outcome
867 CellML models
6 too large (>500K, excluded)
744 successfully converted to ODESystem
650 successfully converted to ODEProblem
608 successfully solved

@shahriariravanian
Copy link
Collaborator

Here is the result file as a CSV file. The res col codes are:

0 -> fail to generate ODESystem
1 -> fail to generate ODEProblem
2 -> fail to solve ODEProblem
3 -> success!
9 -> too large a file, ignored

cellml_results.txt

@ChrisRackauckas
Copy link
Member

Try setting the runner to a lower tolerance. That should help the domain error cases. If not, generate sqrt -> sqrt(abs so step rejects don't error out but instead reject.

@shahriariravanian
Copy link
Collaborator

These are the latest tracking results using ver 2.4.1 (to be pushed soon):

# outcome
867 CellML models
6 too large (>500K, excluded)
775 successfully converted to ODESystem
688 successfully converted to ODEProblem
643 successfully solved

cellml_results_8.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants