-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
igraph optimisation for graphs processing to speedup ISA-Tab write #365
Conversation
…zation roundtriping with ISA-JSON, closes #334
Hi David, the test was working in |
When I left it, I took a closer look at this Anyhow, it makes sense to have a single |
It seems, checking the Travis CI build history, that the commit that broke that test was this one: e0aaeb3 EDIT: I'd say this is the commit that changed the |
🤷 |
Oh, yeah in response to your EDIT: yes, that commit is what will have broken it in ISAdatasets! |
I've reverted |
Quick update: as
@djcomlab: it seems one source is not found in the P.S. Tests seem to run twice as faster with igraph, which is great!! |
Good news about the tests speeding up! Incidentally, loading ISA-Tabs isn't super quick when tables get large as well (but not impossibly so) but I can't think of a possible fix for that off the top of my head right now.
In line L1130
the I'll take a look. |
Oh I remember why this error is probably coming out. The way in which Sample Tabs are loaded tries to map each sample to their derived "source" sample. However, in some Sample Tabs like in GSB3 there are sources that do not have any further sample derived from them. This results in some I actually did make a further deviation from the previous implementation in L1214-1217 in I will revert this part. |
Incidentally, the test that use the sample tab file The test on |
thx @djcomlab. We indeed discussed this issue with @Zigur, i.e. that this test could trip Travis-CI. Much appreciated. Very nice performance gain. |
While reviewing the PR and having a look at "python-igraph", I noticed that this is released under GPLv2: https://github.com/igraph/python-igraph/blob/master/LICENSE |
Oh well, that's too bad. |
Hi, we tryed this branch for Brapi2ISA. Thanks a lot for the help @djcomlab @Zigur @proccaserra |
Hi @cpommier, that's strange since it passes all the ISA API standard tests. When I run the same test as you with brapi-to-isa, it seems to be stuck here: Line 1129 in ed87dc8
This line is trying to find the end nodes in the source to sample graph (study-level) but it shouldn't get stuck. I will try find some time to investigate... |
Interesting, this source to sample is indeed something that intrigs us:elixir-europe/plant-brapi-to-isa#70 |
@cpommier , I have now run the conversion using the docker container with the endpoint detail you gave and got the following: 2020-12-07 15:07:58,136 [INFO]: brapi_to_isa.py(:69) >>trials IDs to be exported : ['dXJuOlVSR0kvdHJpYWwvMjM=']
|
About the But |
@djcomlab I guess we need more information from @cpommier as I managed to run the code from the elixir repo and it doesn't seem to be using graph. My gut is that the underlying issue is a borked mapping between BRAPI observation/observation Levels and ISA Assay. I don't think we can solve this without a call. |
Has been addressed with PR #403 |
This PR contains a new implementation of the
isatab._all_end_to_end_paths()
function that was slowing down when writing very large study and assay tables.I reimplemented the algorithms to use the
igraph
(https://igraph.org) package who's core implementation is based on C and is made available in Python via thepython-igraph
package.The changes required a reimplementation of the
model._build_assay_graph()
that is called when callingStudy.graph
andAssay.graph
attributes. I have added these changes as_build_assay_graph_igraph()
and renamed the originalnetworkx
version to_build_assay_graph_networkx()
.I have left
.graph
to use thenetworkx
implementation as I can see it is now used elsewhere in the ISA-API. I added a new attribute toStudy
andAssay
to use.igraph
to reference the new implementation.I used
test_json2isatab.py
as a quick and dirty benchmark (since we know ISA-JSON loading is relatively quick, so the speedups in test running times should be about the ISA-Tab write) and I observe about a 3x speedup - takes about 60-65 seconds to run all oftest_json2isatab.py
when usingnetworkx
(before the changes) and about 17-20 seconds when usingpython-igraph
(this PR) on a 4-core Intel i7 2.7 GHz MacBook Pro with 16GB RAM.I have not really run any of the other tests to check for any unexpected behaviour since this is branched from the
develop
branch and it looks like there's a lot of things that need cleaning up by the ISA dev team in Oxford.Enjoy!