Coverage tracker for sbml-test-suite and BioModels #35

anandijain · 2021-04-11T19:59:05Z

So I've started testing by just trying to readModel on all of the test-suite cases.

It looks like 426 unique case folders had failures, out of 1780 (I think).

ArgumentError("cannot convert NULL to string") was at least 99% of these errors.

I take it this just means the model file was empty or something? I'm wondering if the files that errored are specific Levels or Versions.

I printed the filenames and error for all the ones that are failing from the test suite, but have not investigated why they fail.
https://github.com/anandijain/SBMLBioModelsRepository.jl/runs/2318653674#step:5:121

example of one that fails:

sbml-test-suite/semantic/00930/00930-sbml-l3v1.xml" => ArgumentError("cannot convert NULL to string")

The text was updated successfully, but these errors were encountered:

exaexa · 2021-04-11T20:06:09Z

Hello!
seems like there's no string identifier where we actually expect one. Can you get a backtrace of one of the errors manually (to see which line in readsbml.jl is broken).
Thanks!

anandijain · 2021-04-11T20:11:58Z

julia> readSBML(bad[1].first)
ERROR: ArgumentError: cannot convert NULL to string
Stacktrace:
 [1] unsafe_string
   @ .\strings\string.jl:70 [inlined]
 [2] unsafe_string
   @ .\c.jl:193 [inlined]
 [3] extractModel(mdl::Ptr{Nothing})
   @ SBML ~\.julia\packages\SBML\ZhP2L\src\readsbml.jl:165

anandijain · 2021-04-11T20:16:36Z

The only other error was

"c:\\Users\\Anand\\.julia\\dev\\SBMLBioModelsRepository\\src\\../data/sbml-test-suite/semantic/01472\\enzyme_model.xml" => AssertionError("Opening SBML document has reported errors")
 "c:\\Users\\Anand\\.julia\\dev\\SBMLBioModelsRepository\\src\\../data/sbml-test-suite/semantic/01473\\enzyme_model.xml" => AssertionError("Opening SBML document has reported errors")```

But these I think are bad anyways, since it's not in the standard cases format.

exaexa · 2021-04-11T20:41:56Z

The 2 look like not even SBML can do anything with it. :]

Re the string error, this looks like there are Species without ID. Does that make sense? Should we just ignore them?

anandijain · 2021-04-11T20:50:13Z

Yeah that sounds good. I don't think there's a way to resolve that. Is it easy to make that error more specific? Having Species has no id is much more satisfying than NULL string

exaexa · 2021-04-11T21:09:24Z

Yeah, the question is more like whether it is more reasonable if we should try to "fix" the error (just omit the species or take the ID from some other place (is there any species identifier in the file?)), or fail loudly, preventing the load of whole SBML that we assume "invalid".

anandijain · 2021-04-11T21:18:09Z

I'd rather be stricter. It's less work and we have a better guarantee of correctness. If it comes up that a lot of the biomd dataset has this problem we may need to. I very much want to avoid writing a parser atop the parser.

exaexa · 2021-04-11T21:42:19Z

Ah ok, nice.

One last thing, could you please track the original download URL of some of the files that fail, e.g.
/home/runner/work/SBMLBioModelsRepository.jl/SBMLBioModelsRepository.jl/src/../data/sbml-test-suite/semantic/00870/00870-sbml-l2v1.xml
?

Thanks a lot!

anandijain · 2021-04-11T21:45:52Z

https://github.com/anandijain/sbml-test-suite/tree/main/semantic/00870 here I think?
This is my lightweight version of the real suite (as their git tree is ~500MB 😅 ) It should be the same path though

anandijain · 2021-04-12T18:24:55Z

So the sbml-test-suite folder is ~65 MB, but the BioModels set is ~4 GB, so we probably don't want to test biomd on every push/PR.

However, I still think we want to test the whole thing maybe on major releases or ad-hoc.

I also found https://github.com/actions/cache and https://github.com/actions/upload-artifact.

I think the first may allow us to store the BioModels dataset and test-suite, we get 5 GB maximum, which is perfect. This would also greatly reduce the hassle and latency of having to download every time.

The second action might be good to upload a results dataframe after each CI run, so we have easier access to the results than having to look through the actual logs.

I haven't used either of these actions before though

Improves upon #35

anandijain · 2021-04-13T16:16:35Z

@anandijain can you please check whether this at least partially improves the problem in #35?

@exaexa https://github.com/anandijain/SBMLBioModelsRepository.jl/runs/2335113429#step:5:106. This link points to the filenames that failed.

All tests are passing.
This is the main test I'm watching @test length(bad) == 1368 # regression test to check that the number of files that readSBML failed on stays the same.

What exactly should I be looking for that has improved? I don't see any differences

exaexa · 2021-04-13T16:21:21Z

This is weird, did you use the correct git branch? (you need latest master). The 00870 file loads okay for me now here:

julia> using SBML
[ Info: Precompiling SBML [e5567a89-2604-4b09-9718-f5f78e97c3bb]

julia> readSBML("00870-sbml-l3v1.xml")
Model(Dict("Kr1" => 2.0,"Kr" => 2.0,"Kf1" => 1.0,"Kf" => 1.0), Dict{String,Array{UnitPart,1}}("volume" => [UnitPart("litre", 1, 0, 1.0)],"time" => [UnitPart("second", 1, 0, 1.0)],"substance" => [UnitPart("mole", 1, 0, 1.0)]), Dict{String,SBML.Compartment}("C" => SBML.Compartment(nothing, true, 3, 1.0, "volume", nothing, nothing)), Dict{String,Species}("A2" => Species(nothing, "C", nothing, nothing, (3.0, "substance"), false, nothing, nothing),"A3" => Species(nothing, "C", nothing, nothing, (4.0, "substance"), false, nothing, nothing),"A1" => Species(nothing, "C", nothing, nothing, (2.0, "substance"), false, nothing, nothing),"A4" => Species(nothing, "C", nothing, nothing, (1.0, "substance"), false, nothing, nothing)), Dict{String,Reaction}("fasterReaction" => Reaction(Dict("A2" => 1.0,"A4" => -1.0), (-Inf, ""), (Inf, ""), 0.0, nothing, SBML.MathApply("*", SBML.Math[SBML.MathApply("+", SBML.Math[SBML.MathApply("*", SBML.Math[SBML.MathIdent("Kf1"), SBML.MathIdent("A4")]), SBML.MathApply("-", SBML.Math[SBML.MathApply("*", SBML.Math[SBML.MathIdent("Kr1"), SBML.MathIdent("A2")])])]), SBML.MathIdent("C")]), nothing, nothing),"slowerReaction1" => Reaction(Dict("A2" => -1.0,"A3" => 1.0,"A1" => -1.0), (-Inf, ""), (Inf, ""), 0.0, nothing, SBML.MathApply("*", SBML.Math[SBML.MathApply("+", SBML.Math[SBML.MathApply("*", SBML.Math[SBML.MathApply("*", SBML.Math[SBML.MathIdent("Kf"), SBML.MathIdent("A1")]), SBML.MathIdent("A2")]), SBML.MathApply("-", SBML.Math[SBML.MathApply("*", SBML.Math[SBML.MathIdent("Kr"), SBML.MathIdent("A3")])])]), SBML.MathIdent("C")]), nothing, nothing)), Dict{String,SBML.GeneProduct}(), Dict{String,SBML.FunctionDefinition}(), nothing, nothing)```

anandijain · 2021-04-13T16:22:46Z

I'm using master, as in ]add SBML.

anandijain · 2021-04-13T16:24:25Z

No sorry, apologies. Let me try using master

anandijain · 2021-04-13T16:31:30Z

https://github.com/anandijain/SBMLBioModelsRepository.jl/runs/2335417184#step:5:143

Cool! only two files failed. It'd be good to tag a release

exaexa · 2021-04-13T16:34:15Z

Great to hear that. We're tagging 0.4.0 once #39 is in. :]

exaexa · 2021-04-13T16:35:15Z

Anyway, this is now solved in master, thus closing here. Feel free to suggest more fixes :]

anandijain · 2021-04-13T18:36:39Z

Actually, I think we can keep this open as a coverage tracker. Similar to SciML/CellMLToolkit.jl#23

Here are the BioModels that failed https://github.com/anandijain/SBMLBioModelsRepository.jl/runs/2336326622?check_suite_focus=true#step:5:1700

This ran I limited to 200, but I'm about to run on all of them and see if it times out.

This is the full test build I just started https://github.com/anandijain/SBMLBioModelsRepository.jl/runs/2336401356?check_suite_focus=true

exaexa · 2021-04-13T18:39:13Z

I guess it might be better to do automatic coverage with tests, would that work?

anandijain · 2021-04-13T18:42:17Z

Yes, working up to that I think.

Additionally I don't want to spam BioModels with 4GB of traffic every push/PR

exaexa · 2021-04-13T18:48:23Z

there are some ways to cache the large datasets, not sure if that can be done with GH actions, but if you have e.g. local gitlab there should be some good way to do it efficiently. @laurentheirendt is a wizard in that area, maybe he'll have some quick solution.

anandijain · 2021-04-13T19:17:45Z

A lot of The required attribute 'reversible' is missing from the <reaction> with the id '...'. come up.

There is a LOT of output from those warnings. There is https://github.com/JuliaIO/Suppressor.jl but I'd rather collect all of the warnings to see which are most common, etc

exaexa · 2021-04-13T20:17:29Z

"The required attribute" doesn't sound like from SBML.jl, perhaps it's from libsbml? (I can't easily suppress the log messages from there)

anandijain · 2021-04-13T22:50:06Z

Okay I looked through the log (where I print filename => error). Vast majority of them are the unhelpful AssertionError("Opening SBML document has reported errors") of which I'll have to record the stack traces or warnings or something.

There are maybe 20 with this error: MODEL1703310000.xml" => DomainError(Ptr{Nothing} @0x000000003c9e4e50 Calling Parameter_getUnits returned NULL, valid string expected.")

Currently readSBML fails on 730/2226 of the biomodels set

anandijain changed the title ~~Testing on the sbml-test-suite~~ Testing on the sbml-test-suite and BioModels Apr 12, 2021

exaexa added a commit that referenced this issue Apr 13, 2021

Species.name is optional, don't fail if it's not present

50d9512

Improves upon #35

exaexa mentioned this issue Apr 13, 2021

Sanitize string handling, make species' names optional #37

Merged

exaexa added a commit that referenced this issue Apr 13, 2021

Species.name is optional, don't fail if it's not present

bd12b46

Improves upon #35

exaexa closed this as completed Apr 13, 2021

anandijain changed the title ~~Testing on the sbml-test-suite and BioModels~~ Coverage tracker for sbml-test-suite and BioModels Apr 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coverage tracker for sbml-test-suite and BioModels #35

Coverage tracker for sbml-test-suite and BioModels #35

anandijain commented Apr 11, 2021

exaexa commented Apr 11, 2021

anandijain commented Apr 11, 2021

anandijain commented Apr 11, 2021 •

edited

Loading

exaexa commented Apr 11, 2021

anandijain commented Apr 11, 2021

exaexa commented Apr 11, 2021

anandijain commented Apr 11, 2021

exaexa commented Apr 11, 2021

anandijain commented Apr 11, 2021

anandijain commented Apr 12, 2021

anandijain commented Apr 13, 2021

exaexa commented Apr 13, 2021

anandijain commented Apr 13, 2021

anandijain commented Apr 13, 2021 •

edited

Loading

anandijain commented Apr 13, 2021

exaexa commented Apr 13, 2021

exaexa commented Apr 13, 2021 •

edited

Loading

anandijain commented Apr 13, 2021 •

edited

Loading

exaexa commented Apr 13, 2021

anandijain commented Apr 13, 2021

exaexa commented Apr 13, 2021

anandijain commented Apr 13, 2021

exaexa commented Apr 13, 2021 •

edited

Loading

anandijain commented Apr 13, 2021 •

edited

Loading

Coverage tracker for sbml-test-suite and BioModels #35

Coverage tracker for sbml-test-suite and BioModels #35

Comments

anandijain commented Apr 11, 2021

exaexa commented Apr 11, 2021

anandijain commented Apr 11, 2021

anandijain commented Apr 11, 2021 • edited Loading

exaexa commented Apr 11, 2021

anandijain commented Apr 11, 2021

exaexa commented Apr 11, 2021

anandijain commented Apr 11, 2021

exaexa commented Apr 11, 2021

anandijain commented Apr 11, 2021

anandijain commented Apr 12, 2021

anandijain commented Apr 13, 2021

exaexa commented Apr 13, 2021

anandijain commented Apr 13, 2021

anandijain commented Apr 13, 2021 • edited Loading

anandijain commented Apr 13, 2021

exaexa commented Apr 13, 2021

exaexa commented Apr 13, 2021 • edited Loading

anandijain commented Apr 13, 2021 • edited Loading

exaexa commented Apr 13, 2021

anandijain commented Apr 13, 2021

exaexa commented Apr 13, 2021

anandijain commented Apr 13, 2021

exaexa commented Apr 13, 2021 • edited Loading

anandijain commented Apr 13, 2021 • edited Loading

anandijain commented Apr 11, 2021 •

edited

Loading

anandijain commented Apr 13, 2021 •

edited

Loading

exaexa commented Apr 13, 2021 •

edited

Loading

anandijain commented Apr 13, 2021 •

edited

Loading

exaexa commented Apr 13, 2021 •

edited

Loading

anandijain commented Apr 13, 2021 •

edited

Loading