Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coverage tracker for sbml-test-suite and BioModels #35

Closed
anandijain opened this issue Apr 11, 2021 · 24 comments
Closed

Coverage tracker for sbml-test-suite and BioModels #35

anandijain opened this issue Apr 11, 2021 · 24 comments

Comments

@anandijain
Copy link
Collaborator

So I've started testing by just trying to readModel on all of the test-suite cases.

It looks like 426 unique case folders had failures, out of 1780 (I think).

ArgumentError("cannot convert NULL to string") was at least 99% of these errors.

I take it this just means the model file was empty or something? I'm wondering if the files that errored are specific Levels or Versions.

I printed the filenames and error for all the ones that are failing from the test suite, but have not investigated why they fail.
https://github.com/anandijain/SBMLBioModelsRepository.jl/runs/2318653674#step:5:121

example of one that fails:

sbml-test-suite/semantic/00930/00930-sbml-l3v1.xml" => ArgumentError("cannot convert NULL to string") 
@exaexa
Copy link
Collaborator

exaexa commented Apr 11, 2021

Hello!
seems like there's no string identifier where we actually expect one. Can you get a backtrace of one of the errors manually (to see which line in readsbml.jl is broken).
Thanks!

@anandijain
Copy link
Collaborator Author

julia> readSBML(bad[1].first)
ERROR: ArgumentError: cannot convert NULL to string
Stacktrace:
 [1] unsafe_string
   @ .\strings\string.jl:70 [inlined]
 [2] unsafe_string
   @ .\c.jl:193 [inlined]
 [3] extractModel(mdl::Ptr{Nothing})
   @ SBML ~\.julia\packages\SBML\ZhP2L\src\readsbml.jl:165

@anandijain
Copy link
Collaborator Author

anandijain commented Apr 11, 2021

The only other error was

"c:\\Users\\Anand\\.julia\\dev\\SBMLBioModelsRepository\\src\\../data/sbml-test-suite/semantic/01472\\enzyme_model.xml" => AssertionError("Opening SBML document has reported errors")
 "c:\\Users\\Anand\\.julia\\dev\\SBMLBioModelsRepository\\src\\../data/sbml-test-suite/semantic/01473\\enzyme_model.xml" => AssertionError("Opening SBML document has reported errors")```

But these I think are bad anyways, since it's not in the standard cases format.

@exaexa
Copy link
Collaborator

exaexa commented Apr 11, 2021

The 2 look like not even SBML can do anything with it. :]

Re the string error, this looks like there are Species without ID. Does that make sense? Should we just ignore them?

@anandijain
Copy link
Collaborator Author

Yeah that sounds good. I don't think there's a way to resolve that. Is it easy to make that error more specific? Having Species has no id is much more satisfying than NULL string

@exaexa
Copy link
Collaborator

exaexa commented Apr 11, 2021

Yeah, the question is more like whether it is more reasonable if we should try to "fix" the error (just omit the species or take the ID from some other place (is there any species identifier in the file?)), or fail loudly, preventing the load of whole SBML that we assume "invalid".

@anandijain
Copy link
Collaborator Author

I'd rather be stricter. It's less work and we have a better guarantee of correctness. If it comes up that a lot of the biomd dataset has this problem we may need to. I very much want to avoid writing a parser atop the parser.

@exaexa
Copy link
Collaborator

exaexa commented Apr 11, 2021

Ah ok, nice.

One last thing, could you please track the original download URL of some of the files that fail, e.g.
/home/runner/work/SBMLBioModelsRepository.jl/SBMLBioModelsRepository.jl/src/../data/sbml-test-suite/semantic/00870/00870-sbml-l2v1.xml
?

Thanks a lot!

@anandijain
Copy link
Collaborator Author

https://github.com/anandijain/sbml-test-suite/tree/main/semantic/00870 here I think?
This is my lightweight version of the real suite (as their git tree is ~500MB 😅 ) It should be the same path though

@anandijain anandijain changed the title Testing on the sbml-test-suite Testing on the sbml-test-suite and BioModels Apr 12, 2021
@anandijain
Copy link
Collaborator Author

So the sbml-test-suite folder is ~65 MB, but the BioModels set is ~4 GB, so we probably don't want to test biomd on every push/PR.

However, I still think we want to test the whole thing maybe on major releases or ad-hoc.

I also found https://github.com/actions/cache and https://github.com/actions/upload-artifact.

I think the first may allow us to store the BioModels dataset and test-suite, we get 5 GB maximum, which is perfect. This would also greatly reduce the hassle and latency of having to download every time.

The second action might be good to upload a results dataframe after each CI run, so we have easier access to the results than having to look through the actual logs.

I haven't used either of these actions before though

@anandijain
Copy link
Collaborator Author

@anandijain can you please check whether this at least partially improves the problem in #35?

@exaexa https://github.com/anandijain/SBMLBioModelsRepository.jl/runs/2335113429#step:5:106. This link points to the filenames that failed.

All tests are passing.
This is the main test I'm watching @test length(bad) == 1368 # regression test to check that the number of files that readSBML failed on stays the same.

What exactly should I be looking for that has improved? I don't see any differences

@exaexa
Copy link
Collaborator

exaexa commented Apr 13, 2021

This is weird, did you use the correct git branch? (you need latest master). The 00870 file loads okay for me now here:

julia> using SBML
[ Info: Precompiling SBML [e5567a89-2604-4b09-9718-f5f78e97c3bb]

julia> readSBML("00870-sbml-l3v1.xml")
Model(Dict("Kr1" => 2.0,"Kr" => 2.0,"Kf1" => 1.0,"Kf" => 1.0), Dict{String,Array{UnitPart,1}}("volume" => [UnitPart("litre", 1, 0, 1.0)],"time" => [UnitPart("second", 1, 0, 1.0)],"substance" => [UnitPart("mole", 1, 0, 1.0)]), Dict{String,SBML.Compartment}("C" => SBML.Compartment(nothing, true, 3, 1.0, "volume", nothing, nothing)), Dict{String,Species}("A2" => Species(nothing, "C", nothing, nothing, (3.0, "substance"), false, nothing, nothing),"A3" => Species(nothing, "C", nothing, nothing, (4.0, "substance"), false, nothing, nothing),"A1" => Species(nothing, "C", nothing, nothing, (2.0, "substance"), false, nothing, nothing),"A4" => Species(nothing, "C", nothing, nothing, (1.0, "substance"), false, nothing, nothing)), Dict{String,Reaction}("fasterReaction" => Reaction(Dict("A2" => 1.0,"A4" => -1.0), (-Inf, ""), (Inf, ""), 0.0, nothing, SBML.MathApply("*", SBML.Math[SBML.MathApply("+", SBML.Math[SBML.MathApply("*", SBML.Math[SBML.MathIdent("Kf1"), SBML.MathIdent("A4")]), SBML.MathApply("-", SBML.Math[SBML.MathApply("*", SBML.Math[SBML.MathIdent("Kr1"), SBML.MathIdent("A2")])])]), SBML.MathIdent("C")]), nothing, nothing),"slowerReaction1" => Reaction(Dict("A2" => -1.0,"A3" => 1.0,"A1" => -1.0), (-Inf, ""), (Inf, ""), 0.0, nothing, SBML.MathApply("*", SBML.Math[SBML.MathApply("+", SBML.Math[SBML.MathApply("*", SBML.Math[SBML.MathApply("*", SBML.Math[SBML.MathIdent("Kf"), SBML.MathIdent("A1")]), SBML.MathIdent("A2")]), SBML.MathApply("-", SBML.Math[SBML.MathApply("*", SBML.Math[SBML.MathIdent("Kr"), SBML.MathIdent("A3")])])]), SBML.MathIdent("C")]), nothing, nothing)), Dict{String,SBML.GeneProduct}(), Dict{String,SBML.FunctionDefinition}(), nothing, nothing)```

@anandijain
Copy link
Collaborator Author

I'm using master, as in ]add SBML.

@anandijain
Copy link
Collaborator Author

anandijain commented Apr 13, 2021

No sorry, apologies. Let me try using master

@anandijain
Copy link
Collaborator Author

https://github.com/anandijain/SBMLBioModelsRepository.jl/runs/2335417184#step:5:143

Cool! only two files failed. It'd be good to tag a release

@exaexa
Copy link
Collaborator

exaexa commented Apr 13, 2021

Great to hear that. We're tagging 0.4.0 once #39 is in. :]

@exaexa
Copy link
Collaborator

exaexa commented Apr 13, 2021

Anyway, this is now solved in master, thus closing here. Feel free to suggest more fixes :]

@exaexa exaexa closed this as completed Apr 13, 2021
@anandijain
Copy link
Collaborator Author

anandijain commented Apr 13, 2021

Actually, I think we can keep this open as a coverage tracker. Similar to SciML/CellMLToolkit.jl#23

Here are the BioModels that failed https://github.com/anandijain/SBMLBioModelsRepository.jl/runs/2336326622?check_suite_focus=true#step:5:1700

This ran I limited to 200, but I'm about to run on all of them and see if it times out.

This is the full test build I just started https://github.com/anandijain/SBMLBioModelsRepository.jl/runs/2336401356?check_suite_focus=true

@anandijain anandijain changed the title Testing on the sbml-test-suite and BioModels Coverage tracker for sbml-test-suite and BioModels Apr 13, 2021
@exaexa
Copy link
Collaborator

exaexa commented Apr 13, 2021

I guess it might be better to do automatic coverage with tests, would that work?

@anandijain
Copy link
Collaborator Author

Yes, working up to that I think.

Additionally I don't want to spam BioModels with 4GB of traffic every push/PR

@exaexa
Copy link
Collaborator

exaexa commented Apr 13, 2021

there are some ways to cache the large datasets, not sure if that can be done with GH actions, but if you have e.g. local gitlab there should be some good way to do it efficiently. @laurentheirendt is a wizard in that area, maybe he'll have some quick solution.

@anandijain
Copy link
Collaborator Author

A lot of The required attribute 'reversible' is missing from the <reaction> with the id '...'. come up.

There is a LOT of output from those warnings. There is https://github.com/JuliaIO/Suppressor.jl but I'd rather collect all of the warnings to see which are most common, etc

@exaexa
Copy link
Collaborator

exaexa commented Apr 13, 2021

"The required attribute" doesn't sound like from SBML.jl, perhaps it's from libsbml? (I can't easily suppress the log messages from there)

@anandijain
Copy link
Collaborator Author

anandijain commented Apr 13, 2021

Okay I looked through the log (where I print filename => error). Vast majority of them are the unhelpful AssertionError("Opening SBML document has reported errors") of which I'll have to record the stack traces or warnings or something.

There are maybe 20 with this error: MODEL1703310000.xml" => DomainError(Ptr{Nothing} @0x000000003c9e4e50 Calling Parameter_getUnits returned NULL, valid string expected.")

Currently readSBML fails on 730/2226 of the biomodels set

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants