Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

igraph optimisation for graphs processing to speedup ISA-Tab write #365

Closed
wants to merge 59 commits into from

Conversation

djcomlab
Copy link
Member

This PR contains a new implementation of the isatab._all_end_to_end_paths() function that was slowing down when writing very large study and assay tables.

I reimplemented the algorithms to use the igraph (https://igraph.org) package who's core implementation is based on C and is made available in Python via the python-igraph package.

The changes required a reimplementation of the model._build_assay_graph() that is called when calling Study.graph and Assay.graph attributes. I have added these changes as _build_assay_graph_igraph() and renamed the original networkx version to _build_assay_graph_networkx().

I have left .graph to use the networkx implementation as I can see it is now used elsewhere in the ISA-API. I added a new attribute to Study and Assay to use .igraph to reference the new implementation.

I used test_json2isatab.py as a quick and dirty benchmark (since we know ISA-JSON loading is relatively quick, so the speedups in test running times should be about the ISA-Tab write) and I observe about a 3x speedup - takes about 60-65 seconds to run all of test_json2isatab.py when using networkx (before the changes) and about 17-20 seconds when using python-igraph (this PR) on a 4-core Intel i7 2.7 GHz MacBook Pro with 16GB RAM.

I have not really run any of the other tests to check for any unexpected behaviour since this is branched from the develop branch and it looks like there's a lot of things that need cleaning up by the ISA dev team in Oxford.

Enjoy!

Philippe Rocca-Serra and others added 30 commits February 11, 2020 16:25
@Zigur
Copy link
Contributor

Zigur commented Oct 13, 2020

Hi David, the test was working in master until the last release. Now it generates an investigation file without double-quoted items. This might be due to some other change that was introduced to fix a quoting error we got reported (I am not sure). The investigation file itself is correct, it's just the quoting that is missing. We are not using tests-dev any longer.

@djcomlab
Copy link
Member Author

When I left it, tests was the test data directory for master, and tests-dev was for develop because there were some problems found in the test data of tests that were then fixed in tests-dev after some checks on the functionality of develop.

I took a closer look at this json2isatab BII-S-3 investigation test and the issue was in BII-S-3.json, where in tests the study identifier is #study/BII-S-3 (probably a bug when BII-S-3.json was generated as isatab2json uses those #study type prefixes as internal ids in ISA-JSON) while in tests-dev it is fixed to just BII-S-3.

Anyhow, it makes sense to have a single tests data branch, but just be aware that if there are some broken tests it could sometimes be a problem in the test data.

@Zigur
Copy link
Contributor

Zigur commented Oct 13, 2020

It seems, checking the Travis CI build history, that the commit that broke that test was this one: e0aaeb3
Its corresponding build fails: https://travis-ci.org/github/ISA-tools/isa-api/builds/701236200
The previous one was passing all right: https://travis-ci.org/github/ISA-tools/isa-api/builds/695980927
Which does not make much sense as the only changes were to create-related files...it could be that the ISA-JSON file (the BII-S-3.json that you mentioned), rather than the investigation was changed in the meantime...will dig this up.

EDIT: I'd say this is the commit that changed the BII-S-3.json and possibly some others: ISA-tools/ISAdatasets@0acf28f#diff-b19d780d6433e3bf7c5ece07f1adf38b04ed9ed80b7a6d902bd2bcb3512d8bd9

@djcomlab
Copy link
Member Author

🤷

@djcomlab
Copy link
Member Author

Oh, yeah in response to your EDIT: yes, that commit is what will have broken it in ISAdatasets!

@Zigur
Copy link
Contributor

Zigur commented Oct 13, 2020

I've reverted BII-S-3.json and put the test back...now it passes locally, waiting for Travis

@Zigur
Copy link
Contributor

Zigur commented Oct 13, 2020

Quick update: as develop is now passing the checks, I've realigned this PR to it.
On Travis, this PR seems to hang on forever. Locally, there is one test failing yet, in test_sampletab2isatab, and it seems related to igraph.

======================================================================
ERROR: test_sampletab2json_GSB_3 (test_sampletab2isatab.TestSampleTab2IsaTab)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/massi/Projects/oerc/isa-api/tests/test_sampletab2isatab.py", line 37, in test_sampletab2json_GSB_3
    sampletab2isatab.convert(source_sampletab_fp=sampletab_fp, target_dir=self._tmp_dir)
  File "/Users/massi/Projects/oerc/isa-api/isatools/convert/sampletab2isatab.py", line 17, in convert
    isatab.dump(ISA, target_dir)
  File "/Users/massi/Projects/oerc/isa-api/isatools/isatab.py", line 1047, in dump
    write_study_table_files(investigation, output_path)
  File "/Users/massi/Projects/oerc/isa-api/isatools/isatab.py", line 1218, in write_study_table_files
    paths = _all_end_to_end_paths_igraph(g, u, sources)
  File "/Users/massi/Projects/oerc/isa-api/isatools/isatab.py", line 1130, in _all_end_to_end_paths_igraph
    paths += G.get_all_simple_paths(U.index(start), all_terminating_vertices)
ValueError: isatools.model.Source(name='HPSI-aimh', characteristics=[isatools.model.Characteristic(category=isatools.model.OntologyAnnotation(term='Sample Accession', term_source=None, term_accession='', comments=[]), value='SAMEA103988373', unit=None, comments=[]), isatools.model.Characteristic(category=isatools.model.OntologyAnnotation(term='Sample Description', term_source=None, term_accession='', comments=[]), value='donor of biomaterial', unit=None, comments=[]), isatools.model.Characteristic(category=isatools.model.OntologyAnnotation(term='disease state', term_source=None, term_accession='', comments=[]), value=isatools.model.OntologyAnnotation(term='', term_source=None, term_accession='', comments=[]), unit=None, comments=[]), isatools.model.Characteristic(category=isatools.model.OntologyAnnotation(term='age', term_source=None, term_accession='', comments=[]), value='', unit=isatools.model.OntologyAnnotation(term='', term_source=None, term_accession='', comments=[]), comments=[]), isatools.model.Characteristic(category=isatools.model.OntologyAnnotation(term='ethnicity', term_source=None, term_accession='', comments=[]), value='', unit=None, comments=[]), isatools.model.Characteristic(category=isatools.model.OntologyAnnotation(term='cell type', term_source=None, term_accession='', comments=[]), value=isatools.model.OntologyAnnotation(term='', term_source=None, term_accession='', comments=[]), unit=None, comments=[]), isatools.model.Characteristic(category=isatools.model.OntologyAnnotation(term='donor id', term_source=None, term_accession='', comments=[]), value='', unit=None, comments=[]), isatools.model.Characteristic(category=isatools.model.OntologyAnnotation(term='phenotype', term_source=None, term_accession='', comments=[]), value='', unit=None, comments=[])], comments=[isatools.model.Comment(name='Sample Accession', value='SAMEA103988373')]) is not in list```

@djcomlab: it seems one source is not found in the all_terminating_vertices if I get it right...any idea on what it could be?
I am skipping temporarily this test just to see if Travis still hangs.

P.S. Tests seem to run twice as faster with igraph, which is great!!

@coveralls
Copy link

coveralls commented Oct 13, 2020

Coverage Status

Coverage decreased (-0.2%) to 66.84% when pulling ed87dc8 on igraph-optimisation into 58606de on master.

@djcomlab
Copy link
Member Author

Good news about the tests speeding up! Incidentally, loading ISA-Tabs isn't super quick when tables get large as well (but not impossibly so) but I can't think of a possible fix for that off the top of my head right now.

all_terminating_vertices shouldn't be missing a source - it should only be considering samples if the start nodes are sources (i.e. typical of study-sample table/graphs), and only considering terminating process nodes if the start nodes are samples (i.e. typical of assay table/graphs).

In line L1130

paths += G.get_all_simple_paths(U.index(start), all_terminating_vertices)

the ValueError is from U.index(start), which means source with name HPSI-aimh is somehow missing from the igraph graph generated by _ _all_end_to_end_paths_igraph(). I find this strange as I expected the igraph implementation to behave the same as the networkx one as the logic is actually the same in both.

I'll take a look.

@djcomlab
Copy link
Member Author

Oh I remember why this error is probably coming out. The way in which Sample Tabs are loaded tries to map each sample to their derived "source" sample. However, in some Sample Tabs like in GSB3 there are sources that do not have any further sample derived from them. This results in some Source objects being created and put into Study.sources that then do not appear in any Process in the process sequence.

I actually did make a further deviation from the previous implementation in L1214-1217 in isatab.py where to collect the sources to feed into _all_end_to_end_paths_igraph() I changed it to just grab them from Study.sources, or if that happens to be an empty list, to then go through the process sequence and collect the Source objects from there. This is what would have broken in sampletab2isatab.

I will revert this part.

@djcomlab
Copy link
Member Author

djcomlab commented Oct 13, 2020

Incidentally, the test that use the sample tab file GSB-3.txt always takes a very long time to run and might make Travis timeout. I suggest permanently removing it from the tests.

The test on GSB-718.txt shouldn't take as along - on my MBP it took under 3 minutes to process.

@proccaserra
Copy link
Member

Incidentally, the test that use the sample tab file GSB-3.txt always takes a very long time to run and might make Travis timeout. I suggest permanently removing it from the tests.

The test on GSB-718.txt shouldn't take as along - on my MBP it took under 3 minutes to process.

thx @djcomlab. We indeed discussed this issue with @Zigur, i.e. that this test could trip Travis-CI. Much appreciated. Very nice performance gain.

@Zigur
Copy link
Contributor

Zigur commented Oct 15, 2020

While reviewing the PR and having a look at "python-igraph", I noticed that this is released under GPLv2: https://github.com/igraph/python-igraph/blob/master/LICENSE
As far as my understanding goes, any software that incorporates a "GPL-licenced" piece of software must be released under GPL. This would create all sorts of issues for the isa-api, which is currently released un CPAL. I remember, from a previous discussion on the topic aimed at moving from CPAL to BSD-3, that a change of licence would require the authorization of all contributors.
Personally, I don't like GPL precisely because it's an "integralist" licence.

@djcomlab
Copy link
Member Author

Oh well, that's too bad.

@cpommier
Copy link

cpommier commented Dec 2, 2020

Hi,

we tryed this branch for Brapi2ISA. Thanks a lot for the help @djcomlab @Zigur @proccaserra
Sadly, on our test dataset, it went from 2 minutes to killed after 16 minutes.
We can try to investigate if you wish. But for now we don't know where to begin with.
We are trying with the following parameters
python3 ./plant-brapi-to-isa/brapi_to_isa.py -J -V -e https://urgi.versailles.inra.fr/faidare/brapi/v1/ -t dXJuOlVSR0kvdHJpYWwvMjM=

@djcomlab
Copy link
Member Author

djcomlab commented Dec 2, 2020

Hi @cpommier, that's strange since it passes all the ISA API standard tests. When I run the same test as you with brapi-to-isa, it seems to be stuck here:

all_terminating_vertices = [U.index(x) for x in all_samples if G.outdegree(U.index(x)) == 0]

This line is trying to find the end nodes in the source to sample graph (study-level) but it shouldn't get stuck.

I will try find some time to investigate...

@cpommier
Copy link

cpommier commented Dec 3, 2020

Interesting, this source to sample is indeed something that intrigs us:elixir-europe/plant-brapi-to-isa#70
Thanks, and keep us informed :)

@proccaserra
Copy link
Member

@cpommier , I have now run the conversion using the docker container with the endpoint detail you gave and got the following:

2020-12-07 15:07:58,136 [INFO]: brapi_to_isa.py(:69) >>

trials IDs to be exported : ['dXJuOlVSR0kvdHJpYWwvMjM=']
study IDs to be exported : None
Target endpoint : https://urgi.versailles.inra.fr/faidare/brapi/v1/

2020-12-07 15:07:58,137 [DEBUG]: brapi_client.py(get_ontologies:245) >> GET http://www.obofoundry.org/registry/ontologies.jsonld
2020-12-07 15:07:58,565 [INFO]: brapi_client.py(get_trials:140) >> Return trials: ['dXJuOlVSR0kvdHJpYWwvMjM=']
2020-12-07 15:07:58,565 [DEBUG]: brapi_client.py(fetch_object:151) >> GET https://urgi.versailles.inra.fr/faidare/brapi/v1/trials/dXJuOlVSR0kvdHJpYWwvMjM=
2020-12-07 15:07:58,966 [INFO]: brapi_to_isa.py(main:292) >> we start from a set of Trials
2020-12-07 15:07:58,988 [INFO]: brapi_to_isa.py(main:296) >> Generating output in : outputdir/RixGW/
2020-12-07 15:07:59,322 [DEBUG]: brapi_client.py(_get_obs_unit_call:58) >> GOT OBSERVATIONUNIT THE 1.1 WAY
2020-12-07 15:07:59,323 [DEBUG]: brapi_client.py(fetch_objects:190) >> retrieving page 0 of None from https://urgi.versailles.inra.fr/faidare/brapi/v1/studies/dXJuOlVSR0kvc3R1ZHkvUklHVzE=/observationUnits
2020-12-07 15:07:59,323 [INFO]: brapi_client.py(fetch_objects:191) >> paging params:{'page': 0, 'pageSize': 1000}
2020-12-07 15:07:59,323 [DEBUG]: brapi_client.py(fetch_objects:194) >> GETting https://urgi.versailles.inra.fr/faidare/brapi/v1/studies/dXJuOlVSR0kvc3R1ZHkvUklHVzE=/observationUnits
2020-12-07 15:08:00,005 [DEBUG]: brapi_client.py(fetch_objects:190) >> retrieving page 1 of 5 from https://urgi.versailles.inra.fr/faidare/brapi/v1/studies/dXJuOlVSR0kvc3R1ZHkvUklHVzE=/observationUnits
2020-12-07 15:08:00,006 [INFO]: brapi_client.py(fetch_objects:191) >> paging params:{'page': 1, 'pageSize': 1000}
2020-12-07 15:08:00,006 [DEBUG]: brapi_client.py(fetch_objects:194) >> GETting https://urgi.versailles.inra.fr/faidare/brapi/v1/studies/dXJuOlVSR0kvc3R1ZHkvUklHVzE=/observationUnits
2020-12-07 15:08:00,441 [DEBUG]: brapi_client.py(fetch_objects:190) >> retrieving page 2 of 5 from https://urgi.versailles.inra.fr/faidare/brapi/v1/studies/dXJuOlVSR0kvc3R1ZHkvUklHVzE=/observationUnits
2020-12-07 15:08:00,441 [INFO]: brapi_client.py(fetch_objects:191) >> paging params:{'page': 2, 'pageSize': 1000}
2020-12-07 15:08:00,441 [DEBUG]: brapi_client.py(fetch_objects:194) >> GETting https://urgi.versailles.inra.fr/faidare/brapi/v1/studies/dXJuOlVSR0kvc3R1ZHkvUklHVzE=/observationUnits
2020-12-07 15:08:00,909 [DEBUG]: brapi_client.py(fetch_objects:190) >> retrieving page 3 of 5 from https://urgi.versailles.inra.fr/faidare/brapi/v1/studies/dXJuOlVSR0kvc3R1ZHkvUklHVzE=/observationUnits
2020-12-07 15:08:00,909 [INFO]: brapi_client.py(fetch_objects:191) >> paging params:{'page': 3, 'pageSize': 1000}
2020-12-07 15:08:00,910 [DEBUG]: brapi_client.py(fetch_objects:194) >> GETting https://urgi.versailles.inra.fr/faidare/brapi/v1/studies/dXJuOlVSR0kvc3R1ZHkvUklHVzE=/observationUnits
2020-12-07 15:08:01,503 [DEBUG]: brapi_client.py(fetch_objects:190) >> retrieving page 4 of 5 from https://urgi.versailles.inra.fr/faidare/brapi/v1/studies/dXJuOlVSR0kvc3R1ZHkvUklHVzE=/observationUnits
2020-12-07 15:08:01,503 [INFO]: brapi_client.py(fetch_objects:191) >> paging params:{'page': 4, 'pageSize': 1000}
2020-12-07 15:08:01,503 [DEBUG]: brapi_client.py(fetch_objects:194) >> GETting https://urgi.versailles.inra.fr/faidare/brapi/v1/studies/dXJuOlVSR0kvc3R1ZHkvUklHVzE=/observationUnits
2020-12-07 15:08:02,015 [INFO]: brapi_to_isa_converter.py(get_obs_levels:78) >> Observation Levels in study: block>plot,block>plot>plant
2020-12-07 15:08:02,015 [DEBUG]: brapi_client.py(fetch_object:151) >> GET https://urgi.versailles.inra.fr/faidare/brapi/v1/studies/dXJuOlVSR0kvc3R1ZHkvUklHVzE=
2020-12-07 15:08:02,356 [INFO]: brapi_to_isa_converter.py(create_isa_study:223) >> The observation level block>plot is not supported by MIAPPE at this moment and will not be validated.
2020-12-07 15:08:02,357 [INFO]: brapi_to_isa_converter.py(create_isa_study:224) >> Following observation levels are supported: ['study', 'block', 'sub-block', 'plot', 'sub-plot', 'pot', 'plant'].
2020-12-07 15:08:02,369 [INFO]: brapi_to_isa_converter.py(create_isa_study:223) >> The observation level block>plot>plant is not supported by MIAPPE at this moment and will not be validated.
2020-12-07 15:08:02,370 [INFO]: brapi_to_isa_converter.py(create_isa_study:224) >> Following observation levels are supported: ['study', 'block', 'sub-block', 'plot', 'sub-plot', 'pot', 'plant'].
2020-12-07 15:08:02,370 [INFO]: brapi_to_isa_converter.py(create_isa_study:239) >> Number of ISA assays: 2
2020-12-07 15:08:02,371 [DEBUG]: brapi_client.py(fetch_objects:190) >> retrieving page 0 of None from https://urgi.versailles.inra.fr/faidare/brapi/v1/studies/dXJuOlVSR0kvc3R1ZHkvUklHVzE=/germplasm
2020-12-07 15:08:02,371 [INFO]: brapi_client.py(fetch_objects:191) >> paging params:{'page': 0, 'pageSize': 1000}
2020-12-07 15:08:02,371 [DEBUG]: brapi_client.py(fetch_objects:194) >> GETting https://urgi.versailles.inra.fr/faidare/brapi/v1/studies/dXJuOlVSR0kvc3R1ZHkvUklHVzE=/germplasm
2020-12-07 15:08:02,787 [DEBUG]: brapi_client.py(get_taxonId:232) >> GET https://www.ebi.ac.uk/ena/taxonomy/rest/any-name/Vitis%20vinifera
2020-12-07 15:08:03,826 [INFO]: model.py(graph:1544) >> Building graph for object: Study(
identifier=dXJuOlVSR0kvc3R1ZHkvUklHVzE=
filename=s_dXJuOlVSR0kvc3R1ZHkvUklHVzE=.txt
title=RIGW section I
description=NA in endpoint
submission_date=
public_release_date=
contacts=1 Person objects
design_descriptors=1 OntologyAnnotation objects
publications=0 Publication objects
factors=0 StudyFactor objects
protocols=3 Protocol objects
assays=2 Assay objects
sources=122 Source objects
samples=1132 Sample objects
process_sequence=1132 Process objects
other_material=0 Material objects
characteristic_categories=0 OntologyAnnots
comments=10 Comment objects
units=0 Unit objects
)
2020-12-07 15:08:07,759 [INFO]: model.py(graph:1544) >> Building graph for object: Study(
identifier=dXJuOlVSR0kvc3R1ZHkvUklHVzE=
filename=s_dXJuOlVSR0kvc3R1ZHkvUklHVzE=.txt
title=RIGW section I
description=NA in endpoint
submission_date=
public_release_date=
contacts=1 Person objects
design_descriptors=1 OntologyAnnotation objects
publications=0 Publication objects
factors=0 StudyFactor objects
protocols=3 Protocol objects
assays=2 Assay objects
sources=122 Source objects
samples=1132 Sample objects
process_sequence=1132 Process objects
other_material=0 Material objects
characteristic_categories=0 OntologyAnnots
comments=10 Comment objects
units=0 Unit objects
)
2020-12-07 15:08:11,712 [INFO]: model.py(graph:1544) >> Building graph for object: Study(
identifier=dXJuOlVSR0kvc3R1ZHkvUklHVzE=
filename=s_dXJuOlVSR0kvc3R1ZHkvUklHVzE=.txt
title=RIGW section I
description=NA in endpoint
submission_date=
public_release_date=
contacts=1 Person objects
design_descriptors=1 OntologyAnnotation objects
publications=0 Publication objects
factors=0 StudyFactor objects
protocols=3 Protocol objects
assays=2 Assay objects
sources=122 Source objects
samples=1132 Sample objects
process_sequence=1132 Process objects
other_material=0 Material objects
characteristic_categories=0 OntologyAnnots
comments=10 Comment objects
units=0 Unit objects
)
2020-12-07 15:08:41,672 [INFO]: isatab.py(_all_end_to_end_paths:1152) >> Found 1132 paths!
2020-12-07 15:08:41,788 [INFO]: isatab.py(write_study_table_files:1330) >> Rendered 1132 paths
2020-12-07 15:08:41,801 [INFO]: isatab.py(write_study_table_files:1337) >> Writing 1132 rows
2020-12-07 15:08:41,847 [INFO]: model.py(graph:1544) >> Building graph for object: Assay(
measurement_type=phenotyping
technology_type=block>plot level analysis
technology_platform=
filename=a_dXJuOlVSR0kvc3R1ZHkvUklHVzE=_block>plot.txt
data_files=0 DataFile objects
samples=1501 Sample objects
process_sequence=3002 Process objects
other_material=0 Material objects
characteristic_categories=1 OntologyAnnots
comments=0 Comment objects
units=0 Unit objects
)
2020-12-07 15:08:49,845 [INFO]: model.py(graph:1544) >> Building graph for object: Assay(
measurement_type=phenotyping
technology_type=block>plot level analysis
technology_platform=
filename=a_dXJuOlVSR0kvc3R1ZHkvUklHVzE=_block>plot.txt
data_files=0 DataFile objects
samples=1501 Sample objects
process_sequence=3002 Process objects
other_material=0 Material objects
characteristic_categories=1 OntologyAnnots
comments=0 Comment objects
units=0 Unit objects
)
2020-12-07 15:08:57,868 [INFO]: model.py(graph:1544) >> Building graph for object: Assay(
measurement_type=phenotyping
technology_type=block>plot level analysis
technology_platform=
filename=a_dXJuOlVSR0kvc3R1ZHkvUklHVzE=_block>plot.txt
data_files=0 DataFile objects
samples=1501 Sample objects
process_sequence=3002 Process objects
other_material=0 Material objects
characteristic_categories=1 OntologyAnnots
comments=0 Comment objects
units=0 Unit objects
)
2020-12-07 15:09:08,353 [INFO]: isatab.py(_all_end_to_end_paths:1152) >> Found 385 paths!
2020-12-07 15:09:08,414 [INFO]: model.py(graph:1544) >> Building graph for object: Assay(
measurement_type=phenotyping
technology_type=block>plot>plant level analysis
technology_platform=
filename=a_dXJuOlVSR0kvc3R1ZHkvUklHVzE=_block>plot>plant.txt
data_files=0 DataFile objects
samples=2690 Sample objects
process_sequence=5380 Process objects
other_material=0 Material objects
characteristic_categories=1 OntologyAnnots
comments=0 Comment objects
units=0 Unit objects
)
2020-12-07 15:09:23,185 [INFO]: model.py(graph:1544) >> Building graph for object: Assay(
measurement_type=phenotyping
technology_type=block>plot>plant level analysis
technology_platform=
filename=a_dXJuOlVSR0kvc3R1ZHkvUklHVzE=_block>plot>plant.txt
data_files=0 DataFile objects
samples=2690 Sample objects
process_sequence=5380 Process objects
other_material=0 Material objects
characteristic_categories=1 OntologyAnnots
comments=0 Comment objects
units=0 Unit objects
)
2020-12-07 15:09:37,819 [INFO]: model.py(graph:1544) >> Building graph for object: Assay(
measurement_type=phenotyping
technology_type=block>plot>plant level analysis
technology_platform=
filename=a_dXJuOlVSR0kvc3R1ZHkvUklHVzE=_block>plot>plant.txt
data_files=0 DataFile objects
samples=2690 Sample objects
process_sequence=5380 Process objects
other_material=0 Material objects
characteristic_categories=1 OntologyAnnots
comments=0 Comment objects
units=0 Unit objects
)
2020-12-07 15:09:57,673 [INFO]: isatab.py(_all_end_to_end_paths:1152) >> Found 764 paths!
2020-12-07 15:09:57,769 [INFO]: brapi_to_isa.py(main:411) >> ISA-TAB DUMP DONE!...
2020-12-07 15:09:58,174 [DEBUG]: brapi_client.py(_get_obs_var_call:83) >> GOT OBSERVATIONVARIABLE THE 1.0 WAY
2020-12-07 15:09:58,175 [DEBUG]: brapi_client.py(fetch_objects:190) >> retrieving page 0 of None from https://urgi.versailles.inra.fr/faidare/brapi/v1/studies/dXJuOlVSR0kvc3R1ZHkvUklHVzE=/observationVariables
2020-12-07 15:09:58,175 [INFO]: brapi_client.py(fetch_objects:191) >> paging params:{'page': 0, 'pageSize': 1000}
2020-12-07 15:09:58,176 [DEBUG]: brapi_client.py(fetch_objects:194) >> GETting https://urgi.versailles.inra.fr/faidare/brapi/v1/studies/dXJuOlVSR0kvc3R1ZHkvUklHVzE=/observationVariables
2020-12-07 15:09:58,518 [INFO]: brapi_to_isa.py(write_records_to_file:217) >> Writing to file
2020-12-07 15:09:59,075 [INFO]: brapi_to_isa.py(main:434) >> Generating data files
2020-12-07 15:09:59,075 [INFO]: brapi_to_isa.py(write_records_to_file:217) >> Writing to file
2020-12-07 15:10:00,723 [INFO]: brapi_to_isa.py(main:434) >> Generating data files
2020-12-07 15:10:00,724 [INFO]: brapi_to_isa.py(write_records_to_file:217) >> Writing to file
2020-12-07 15:10:00,978 [INFO]: brapi_to_isa.py(main:479) >> CONVERSION AND VALIDATION FINISHED

The output I obtain contains 2 distinct ISA Assay tables, which correspond to each to the Brapi "Level" which are currently supported. My recollection from the work we did during the biohackathon was that we had to allow users to choose so we allowed several options.

Questions:

  1. Have you changed anything since reporting the failure / execution kill last week?
    I presume the latest code implements what is described in
    Improve performances by having one assay by level elixir-europe/plant-brapi-to-isa#70

  2. igraph is no-go owing to licensing issues from an ISA-API standpoint. Aren't you concerned by this point ?

bw.
(following also on the elixir-europe/plant-brapi-to-isa#70 ticket)

@djcomlab
Copy link
Member Author

djcomlab commented Dec 7, 2020

About the igraph license issue: it's only a no-go for us releasing the modifications in ISA API.

But plant-brapi-to-isa is BSD-3, which is GPL compatible, so they can include/link igraph in their requirements. Not sure how it would work in practice though.

@proccaserra
Copy link
Member

@djcomlab I guess we need more information from @cpommier as I managed to run the code from the elixir repo and it doesn't seem to be using graph. My gut is that the underlying issue is a borked mapping between BRAPI observation/observation Levels and ISA Assay. I don't think we can solve this without a call.

@Zigur Zigur added this to the 0.13 milestone Jul 7, 2021
@terazus
Copy link
Collaborator

terazus commented Oct 20, 2021

Has been addressed with PR #403

@terazus terazus closed this Oct 20, 2021
@terazus terazus deleted the igraph-optimisation branch October 20, 2021 10:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants