Add context storage benchmarking #144

RLKRo · 2023-06-07T08:46:24Z

Description

Add dff.utils.benchmark.context_storage module which contains functions for benchmarking context storages.

This PR also includes all modules inside dff.utils (benchmarking + caching) during doc building.

Checklist

I have covered the code with tests
I have added comments to my code to help others understand it
I have updated the documentation to reflect the changes
I have performed a self-review of the changes
Fix file inclusion in documentation (API ref references files located in /utils/, documentation will potentially fail if served at /docs/build)

…t as pdf

pseusys · 2023-06-09T12:39:42Z

dff/utils/benchmark/context_storage.py

+    return Context(
+        labels={i: (f"flow_{i}", f"node_{i}") for i in range(dialog_len)},
+        requests={i: Message(text=f"request_{i}") for i in range(dialog_len)},
+        responses={i: Message(text=f"response_{i}") for i in range(dialog_len)},
+        misc={str(i): i for i in range(misc_len)},
+    )


Maybe we should make message size random in length to reflect irregular nature of messages?

pseusys · 2023-06-09T12:44:26Z

dff/utils/benchmark/context_storage.py

+
+        # check returned context
+        if actual_context != context:
+            raise RuntimeError(f"True context:\n{context}\nActual context:\n{actual_context}")


How should this succeed if we read, say, 3 last requests from context storage only?
We should mind that OR manually set read/write policy of context storage to ALL.
But that should be done after merge of partial context updates only.

pseusys · 2023-06-09T12:45:56Z

dff/utils/benchmark/context_storage.py

+        f"Size of one context: {context_size} ({tqdm.format_sizeof(context_size, divisor=1024)})"
+    )
+
+    print(f"Starting benchmarking with following parameters:\n{benchmark_config}")


Maybe output as string/string list instead of printing?

- remove export as dataframe - add methods to get context/message/misc - add dimensionality to misc and message - add context update timing

# Conflicts: # .github/workflows/test_coverage.yml # setup.py

RLKRo

Please, consider adding files created by both benchmarking and benchmark visualization website to .gitignore and .dockerignore.

Added benchmark files to ignore files:
2420455

Moreover, the project could benefit from creating separate make targets for clearing benchmark data

Moved database directory inside the benchmark directory:
4263b7e
So now everything generated by benchmark_dbs.py is stored in one place and can easily be removed.

Please, also consider stopping and removing all the docker images after every benchmark

This seems out of scope for the task that benchmarking tries to solve.
Feels like these modifications should be made on the client side (by writing custom scripts for benchmarking).

RLKRo · 2023-08-16T17:14:23Z

dff/utils/benchmark/context_storage.py

+from uuid import uuid4
+import pathlib
+from time import perf_counter
+import typing as tp


Replaced module import with object imports:
2b2c2bb

RLKRo · 2023-08-16T17:20:41Z

dff/utils/benchmark/context_storage.py

+    context_storage: DBContextStorage,
+    context: Context,
+    context_num: int,
+    context_updater=None,


Added type annotation:
acb0557

RLKRo · 2023-08-16T18:01:20Z

dff/utils/benchmark/context_storage.py

+
+
+def time_context_read_write(
+    context_storage: DBContextStorage,


Benchmark requires clear method from context storage. Also, I don't see why someone would want to benchmark Dict as context storage.

I think a much cleaner solution would be to create a DBContextStorage to wrap Dict.

RLKRo · 2023-08-16T18:03:43Z

dff/utils/benchmark/context_storage.py

+
+    uri: str
+    """URI of the context storage."""
+    factory_module: str = "dff.context_storages"


It's not that relevant, but it doesn't require much and might be helpful in case someone wants to benchmark their context storage.

RLKRo · 2023-08-16T18:18:51Z

dff/utils/benchmark/context_storage.py

+    class Config:
+        allow_mutation = False


Moved param to model kwargs:
fbb3a5f

RLKRo · 2023-08-17T16:48:26Z

dff/utils/benchmark/context_storage.py

+
+    def _get_dict(dimensions: tp.Tuple[int, ...]):
+        if len(dimensions) < 2:
+            return "." * dimensions[0]


Strings are now randomized:
62a33d9

RLKRo · 2023-08-17T20:13:51Z

utils/db_benchmark/benchmark_dbs.py

+}
+
+# benchmark
+benchmark_dir = pathlib.Path("benchmarks")


Added comments:
c1f6fbe

RLKRo · 2023-08-17T20:40:20Z

dff/utils/benchmark/context_storage.py

+    )
+
+
+def report(


Done:
85df6d1

RLKRo · 2023-08-17T22:50:02Z

dff/utils/benchmark/context_storage.py

+    """
+    from_dialog_len: int = 300
+    """Starting dialog len of a context."""
+    to_dialog_len: int = 311


No, there's no statistics behind this number, it's just so that we'd have 10 update steps. I don't think that should be explained in the doc.

Also, I don't think that counts as a magic number.

RLKRo · 2023-08-18T00:03:44Z

dff/utils/benchmark/context_storage.py

+    def _get_dict(dimensions: tp.Tuple[int, ...]):
+        if len(dimensions) < 2:
+            return "." * dimensions[0]
+        return {i: _get_dict(dimensions[1:]) for i in range(dimensions[0])}


Generalized BenchmarkConfig:
87b1820

Now users can create their own BenchmarkConfigs using simple interface supported by streamlit app and report function.

I don't think random configs should be included in the library, though: I don't see a good use case for that and it would be too complicated.

They can already be uploaded via the `Upload benchmark results` interface

Since files can no longer be added via their path on the filesystem, deleted benchmarks are always the uploaded ones.

dff/utils/db_benchmark/benchmark.py

tutorials/context_storages/8_db_benchmarking.py

# Conflicts: # docs/source/conf.py # setup.py

# Conflicts: # setup.py

RLKRo added 2 commits June 7, 2023 11:40

add context storage benchmarking

a870dd2

add all dff.utils modules to doc

afcdce9

RLKRo added documentation Improvements or additions to documentation enhancement New feature or request labels Jun 7, 2023

RLKRo self-assigned this Jun 7, 2023

RLKRo added 7 commits June 7, 2023 11:47

fix doc

365e996

add support for benchmarking multiple context storages at a time

4d6a23f

add type annotations; option to pass multiple context storages; expor…

5815511

…t as pdf

add option to get results as a dataframe

17b11fc

format

2c83245

add tutorial for benchmark

e8564b1

update benchmark dependencies

a07588c

RLKRo marked this pull request as draft June 9, 2023 08:56

pseusys reviewed Jun 9, 2023

View reviewed changes

RLKRo added 16 commits June 22, 2023 15:07

update benchmark utils

258be60

- remove export as dataframe - add methods to get context/message/misc - add dimensionality to misc and message - add context update timing

add benchmark_dbs and benchmark_streamlit

b970e83

update dependencies

3c42227

use python3.8 compatible typing

15795df

return ydb & reorder benchmark sets

13cc2d7

add more benchmark cases

25c5b7e

improve diff viewing

2df1054

reduce dialog len for extreme cases

cd0eabf

change benchmark format

185d950

change benchmark format: generic factory

c16cd28

bugfix: repeated format update

5de5a83

move generic benchmark tools to utils

8fc62a8

add average read+update column

1f450e5

add mass compare tab

4404ef3

update extreme cases params

5fb2e69

set step_dialog_len to 1 by default

4c51473

RLKRo added 6 commits August 18, 2023 03:06

generalize BenchmarkConfig

87b1820

add .dockerignore && add benchmark files to ignore

2420455

reformat

988f0ac

move databases inside the benchmark_dir

4263b7e

fix doc

ebcbc07

Merge branch 'dev' into feat/db-benchmark

8968b28

# Conflicts: # .github/workflows/test_coverage.yml # setup.py

RLKRo commented Aug 18, 2023

View reviewed changes

use tutorial directives

5e604bd

RLKRo mentioned this pull request Aug 22, 2023

Documentation links in tutorials #136

Merged

4 tasks

Merge branch 'dev' into feat/db-benchmark

0fd1fc6

RLKRo requested review from pseusys and kudep August 22, 2023 19:41

RLKRo added 5 commits August 24, 2023 23:34

remove ability to add files from filesystem

3d54611

They can already be uploaded via the `Upload benchmark results` interface

delete files from filesystem when sets are deleted via the interface

e422621

Since files can no longer be added via their path on the filesystem, deleted benchmarks are always the uploaded ones.

link files referenced in the documentation to docs/source

ec80dc3

add dependency info for streamlit app

e441da0

add streamlit screenshots inside the tutorial

512ed66

RLKRo mentioned this pull request Aug 25, 2023

Partial context updates #93

Open

6 tasks

ruthenian8 reviewed Aug 28, 2023

View reviewed changes

dff/utils/db_benchmark/benchmark.py Outdated Show resolved Hide resolved

tutorials/context_storages/8_db_benchmarking.py Show resolved Hide resolved

RLKRo added 3 commits September 11, 2023 11:48

Merge branch 'dev' into feat/db-benchmark

378d87e

# Conflicts: # docs/source/conf.py # setup.py

reupload images

bb0e86b

add more exception info

8c419b5

RLKRo requested a review from ruthenian8 September 11, 2023 21:10

ruthenian8 approved these changes Sep 12, 2023

View reviewed changes

pseusys approved these changes Sep 13, 2023

View reviewed changes

RLKRo mentioned this pull request Sep 21, 2023

Fix/dashboard #227

Merged

4 tasks

kudep approved these changes Sep 28, 2023

View reviewed changes

Merge branch 'dev' into feat/db-benchmark

093fa52

# Conflicts: # setup.py

RLKRo merged commit a3ea816 into dev Sep 28, 2023

RLKRo deleted the feat/db-benchmark branch September 28, 2023 12:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add context storage benchmarking #144

Add context storage benchmarking #144

RLKRo commented Jun 7, 2023 •

edited

Loading

pseusys Jun 9, 2023

pseusys Jun 9, 2023 •

edited

Loading

pseusys Jun 9, 2023

RLKRo left a comment

RLKRo Aug 16, 2023 •

edited

Loading

RLKRo Aug 16, 2023

RLKRo Aug 16, 2023

RLKRo Aug 16, 2023

RLKRo Aug 16, 2023 •

edited

Loading

RLKRo Aug 17, 2023 •

edited

Loading

RLKRo Aug 17, 2023 •

edited

Loading

RLKRo Aug 17, 2023 •

edited

Loading

RLKRo Aug 17, 2023

RLKRo Aug 18, 2023 •

edited

Loading



		def time_context_read_write(
		context_storage: DBContextStorage,

Add context storage benchmarking #144

Add context storage benchmarking #144

Conversation

RLKRo commented Jun 7, 2023 • edited Loading

Description

Checklist

pseusys Jun 9, 2023

Choose a reason for hiding this comment

pseusys Jun 9, 2023 • edited Loading

Choose a reason for hiding this comment

pseusys Jun 9, 2023

Choose a reason for hiding this comment

RLKRo left a comment

Choose a reason for hiding this comment

RLKRo Aug 16, 2023 • edited Loading

Choose a reason for hiding this comment

RLKRo Aug 16, 2023

Choose a reason for hiding this comment

RLKRo Aug 16, 2023

Choose a reason for hiding this comment

RLKRo Aug 16, 2023

Choose a reason for hiding this comment

RLKRo Aug 16, 2023 • edited Loading

Choose a reason for hiding this comment

RLKRo Aug 17, 2023 • edited Loading

Choose a reason for hiding this comment

RLKRo Aug 17, 2023 • edited Loading

Choose a reason for hiding this comment

RLKRo Aug 17, 2023 • edited Loading

Choose a reason for hiding this comment

RLKRo Aug 17, 2023

Choose a reason for hiding this comment

RLKRo Aug 18, 2023 • edited Loading

Choose a reason for hiding this comment

RLKRo commented Jun 7, 2023 •

edited

Loading

pseusys Jun 9, 2023 •

edited

Loading

RLKRo Aug 16, 2023 •

edited

Loading

RLKRo Aug 16, 2023 •

edited

Loading

RLKRo Aug 17, 2023 •

edited

Loading

RLKRo Aug 17, 2023 •

edited

Loading

RLKRo Aug 17, 2023 •

edited

Loading

RLKRo Aug 18, 2023 •

edited

Loading