-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add context storage benchmarking #144
Conversation
return Context( | ||
labels={i: (f"flow_{i}", f"node_{i}") for i in range(dialog_len)}, | ||
requests={i: Message(text=f"request_{i}") for i in range(dialog_len)}, | ||
responses={i: Message(text=f"response_{i}") for i in range(dialog_len)}, | ||
misc={str(i): i for i in range(misc_len)}, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should make message size random in length to reflect irregular nature of messages?
|
||
# check returned context | ||
if actual_context != context: | ||
raise RuntimeError(f"True context:\n{context}\nActual context:\n{actual_context}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How should this succeed if we read, say, 3 last requests from context storage only?
We should mind that OR manually set read/write policy of context storage to ALL.
But that should be done after merge of partial context updates only.
f"Size of one context: {context_size} ({tqdm.format_sizeof(context_size, divisor=1024)})" | ||
) | ||
|
||
print(f"Starting benchmarking with following parameters:\n{benchmark_config}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe output as string/string list instead of printing?
- remove export as dataframe - add methods to get context/message/misc - add dimensionality to misc and message - add context update timing
# Conflicts: # .github/workflows/test_coverage.yml # setup.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, consider adding files created by both benchmarking and benchmark visualization website to .gitignore and .dockerignore.
Added benchmark files to ignore files:
2420455
Moreover, the project could benefit from creating separate make targets for clearing benchmark data
Moved database directory inside the benchmark directory:
4263b7e
So now everything generated by benchmark_dbs.py
is stored in one place and can easily be removed.
Please, also consider stopping and removing all the docker images after every benchmark
This seems out of scope for the task that benchmarking tries to solve.
Feels like these modifications should be made on the client side (by writing custom scripts for benchmarking).
from uuid import uuid4 | ||
import pathlib | ||
from time import perf_counter | ||
import typing as tp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced module import with object imports:
2b2c2bb
context_storage: DBContextStorage, | ||
context: Context, | ||
context_num: int, | ||
context_updater=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added type annotation:
acb0557
|
||
|
||
def time_context_read_write( | ||
context_storage: DBContextStorage, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmark requires clear
method from context storage. Also, I don't see why someone would want to benchmark Dict
as context storage.
I think a much cleaner solution would be to create a DBContextStorage
to wrap Dict
.
|
||
uri: str | ||
"""URI of the context storage.""" | ||
factory_module: str = "dff.context_storages" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not that relevant, but it doesn't require much and might be helpful in case someone wants to benchmark their context storage.
class Config: | ||
allow_mutation = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved param to model kwargs:
fbb3a5f
|
||
def _get_dict(dimensions: tp.Tuple[int, ...]): | ||
if len(dimensions) < 2: | ||
return "." * dimensions[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strings are now randomized:
62a33d9
utils/db_benchmark/benchmark_dbs.py
Outdated
} | ||
|
||
# benchmark | ||
benchmark_dir = pathlib.Path("benchmarks") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments:
c1f6fbe
) | ||
|
||
|
||
def report( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done:
85df6d1
""" | ||
from_dialog_len: int = 300 | ||
"""Starting dialog len of a context.""" | ||
to_dialog_len: int = 311 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, there's no statistics behind this number, it's just so that we'd have 10 update steps. I don't think that should be explained in the doc.
Also, I don't think that counts as a magic number.
def _get_dict(dimensions: tp.Tuple[int, ...]): | ||
if len(dimensions) < 2: | ||
return "." * dimensions[0] | ||
return {i: _get_dict(dimensions[1:]) for i in range(dimensions[0])} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generalized BenchmarkConfig
:
87b1820
Now users can create their own BenchmarkConfig
s using simple interface supported by streamlit app and report function.
I don't think random configs should be included in the library, though: I don't see a good use case for that and it would be too complicated.
They can already be uploaded via the `Upload benchmark results` interface
Since files can no longer be added via their path on the filesystem, deleted benchmarks are always the uploaded ones.
# Conflicts: # docs/source/conf.py # setup.py
# Conflicts: # setup.py
Description
Add
dff.utils.benchmark.context_storage
module which contains functions for benchmarking context storages.This PR also includes all modules inside
dff.utils
(benchmarking + caching) during doc building.Checklist
/utils/
, documentation will potentially fail if served at/docs/build
)