-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DuckDB: Add in-memory results #274
Conversation
end = timeit.default_timer() | ||
print(end - start) | ||
|
||
with open('queries.sql', 'r') as file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the ClickBench repository,
- in duckdb-memory/, the load and query steps are combined (this file), whereas
- in duckdb/, the load and query steps are split (load.py and query.py).
I guess it would be easier to figure out the difference between both variants if this is made consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rschu1ze Thanks for the feedback!
The reason the duckdb-memory
implementation combines load and queries in a single script is that DuckDB runs in-process so if there is no on-disk persistence, we have to run the load and the queries in the same process to keep the data in memory.
There are many DuckDB implementations for ClickBench now, so unifying them would make sense. The likely approach will be using a simple Python script that uses systems calls (such as https://github.com/ClickHouse/ClickBench/pull/274/files#diff-1d8a1d4c9a1a7c7e98e5ac68a2ad1c33b9b7125f482922a040026bfbc2976cffR27) to enforce cache eviction. Would this approach work for the duckdb/
implementations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that approach would work and a unification would generally make sense. But let's do this separately (--> new PR).
end = timeit.default_timer() | ||
print(end - start) | ||
|
||
with open('queries.sql', 'r') as file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that approach would work and a unification would generally make sense. But let's do this separately (--> new PR).
I was able to reproduce the measurements on an 'c6a.metal' instance - thanks. |
This PR add DuckDB v1.1.3 in-memory results for a run on a
c6a.metal
instance.