feat(raw): add arrow support in backend #2083

MartinBelthle · 2024-07-05T17:12:48Z

There are few things to note when using the arrow format:
1- For output files the index is now inside the dataframe as the column "Index".
2- The columns names are now in string (forced by pyarrow)

laurent-laporte-pro · 2024-07-06T12:42:05Z

antarest/study/web/raw_studies_blueprint.py

@@ -59,6 +60,13 @@
    ".json": ("application/json", "utf-8"),
 }

+
+class MATRIX_FORMAT(EnumIgnoreCase):
+    JSON = "json"


Selon les cas d'usage dans l'application, vous pouvez avoir "json-split" (cas général) ou "json-index" (cas des Table Form du scenario builder, par exemple).

Exemple :

>>> import pandas as pd >>> df = pd.DataFrame(data=[[1, 2, 3.14], [4, 5, 6.18]], index=["00:00", "01:00"], columns=["TS-1", "TS-2", "TS-3"]) >>> df.to_json(orient="split") '{"columns":["TS-1","TS-2","TS-3"],"index":["00:00","01:00"],"data":[[1,2,3.14],[4,5,6.18]]}' >>> df.to_json(orient="index") '{"00:00":{"TS-1":1,"TS-2":2,"TS-3":3.14},"01:00":{"TS-1":4,"TS-2":5,"TS-3":6.18}}'

antarest/study/web/raw_studies_blueprint.py

antarest/study/storage/abstract_storage_service.py

antarest/study/storage/rawstudy/model/filesystem/matrix/matrix.py

antarest/study/storage/rawstudy/model/filesystem/matrix/input_series_matrix.py

antarest/study/web/raw_studies_blueprint.py

sylvlecl · 2024-07-10T07:25:09Z

antarest/study/web/raw_studies_blueprint.py

+        else:
+            real_format = "json" if formatted else "bytes"
+
+        output = study_service.get(uuid, path, depth=depth, format=real_format, params=parameters)


From a global point of view, I think it's weird to push the formatting down all the call stack to serialize the matrix there.

Usually, application manipulate "objects" in memory, and then we serialize it according to the requested format only when we need it.

So here for example, I think it would make more sense to retrieve a DataFrame object, and serialize it here.

Do you think it's possible ? Maybe in another PR because it has a bigger impact on the codebase ?
In particular, we still want matrices to be formatted as JSON when getting a whole tree from the file tree.

I think it's possible and I agree it's a better way to do it. I tried inside this PR but it's too big of a work so we should tackle it inside another PR.

Remember that, some binary files in the study directory are not DataFrame at all : *.ico, XML files, user-defined files stored in the user directory…

sylvlecl · 2024-07-10T09:35:11Z

tests/integration/raw_studies_blueprint/test_fetch_raw_data.py

@@ -174,33 +155,63 @@ def test_get_study(

        # If we ask for a matrix, we should have a JSON content if formatted is True
        rel_path = "/input/links/de/fr"


We should probably test it with an output matrix too, since they are handled by different code paths

I've duplicated the input tests to do the output matrix ones. The code of the test is a bit more complicated but it doesn't bother me.

antarest/study/web/raw_studies_blueprint.py

MartinBelthle added the draft label Jul 5, 2024

MartinBelthle self-assigned this Jul 5, 2024

pull-request-size bot added the size/L label Jul 5, 2024

laurent-laporte-pro approved these changes Jul 6, 2024

View reviewed changes

MartinBelthle added waiting for review and removed draft labels Jul 9, 2024

MartinBelthle requested a review from sylvlecl July 9, 2024 10:14

sylvlecl requested changes Jul 10, 2024

View reviewed changes

MartinBelthle added changes requested and removed waiting for review labels Jul 10, 2024

MartinBelthle force-pushed the add-arrow-support-in-backend branch 2 times, most recently from b0b2ee2 to bad58a7 Compare July 11, 2024 07:50

pull-request-size bot added size/XL and removed size/L labels Jul 11, 2024

MartinBelthle requested a review from sylvlecl July 11, 2024 16:07

MartinBelthle added waiting for review and removed changes requested labels Jul 11, 2024

MartinBelthle added 13 commits July 18, 2024 18:01

first draft

02c8851

remove useless import

b91ae95

fix little issue

0202d5d

little fix

f382780

make json a default value

40c22cf

fix some tests

6f0824e

fix last tests

3fcb748

add pyarrow to requirements

3a3308a

last change

2cb0d7e

remove useless import

6d411cd

add tests

e151750

add arrow support in put raw

529de98

resolve comments

6452e54

MartinBelthle added 4 commits July 18, 2024 18:01

change doc

f358549

resolve some issues

cb7a788

fix another test

f5a4623

resolve last test

9ac959f

MartinBelthle force-pushed the add-arrow-support-in-backend branch from 850cff5 to 9ac959f Compare July 18, 2024 16:01

MartinBelthle added 10 commits July 19, 2024 09:47

resolve conflicts with dev

51446d2

rebase with dev

f496108

resolve conflicts

e7b7703

resolve issue with dev

beb23fc

fix tets

8311d89

fix lint

0a7ccf3

Merge branch 'dev' into add-arrow-support-in-backend

fa505f5

Merge branch 'dev' into add-arrow-support-in-backend

4a5732d

Merge branch 'dev' into add-arrow-support-in-backend

123419e

Merge branch 'dev' into add-arrow-support-in-backend

59f8beb

MartinBelthle removed the waiting for review label Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(raw): add arrow support in backend #2083

feat(raw): add arrow support in backend #2083

MartinBelthle commented Jul 5, 2024 •

edited

Loading

laurent-laporte-pro Jul 6, 2024

sylvlecl Jul 10, 2024

MartinBelthle Jul 10, 2024

laurent-laporte-pro Jul 12, 2024 •

edited

Loading

sylvlecl Jul 10, 2024

MartinBelthle Jul 10, 2024 •

edited

Loading

		@@ -174,33 +155,63 @@ def test_get_study(

		# If we ask for a matrix, we should have a JSON content if formatted is True
		rel_path = "/input/links/de/fr"

feat(raw): add arrow support in backend #2083

Are you sure you want to change the base?

feat(raw): add arrow support in backend #2083

Conversation

MartinBelthle commented Jul 5, 2024 • edited Loading

laurent-laporte-pro Jul 6, 2024

Choose a reason for hiding this comment

sylvlecl Jul 10, 2024

Choose a reason for hiding this comment

MartinBelthle Jul 10, 2024

Choose a reason for hiding this comment

laurent-laporte-pro Jul 12, 2024 • edited Loading

Choose a reason for hiding this comment

sylvlecl Jul 10, 2024

Choose a reason for hiding this comment

MartinBelthle Jul 10, 2024 • edited Loading

Choose a reason for hiding this comment

MartinBelthle commented Jul 5, 2024 •

edited

Loading

laurent-laporte-pro Jul 12, 2024 •

edited

Loading

MartinBelthle Jul 10, 2024 •

edited

Loading