Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MotherDuck-enabled pg_duckdb results. #272
Add MotherDuck-enabled pg_duckdb results. #272
Changes from 6 commits
adb3156
bfff3fd
062966e
17e3291
9fec3a2
2157d07
5a03ffc
ad4c187
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran benchmark.sh on my local EC2 c6a.4xlarge machine and got these numbers:
Some of the hot runs (2nd + 3rd measurement) on my machine took > 1 sec (the maximum was >11 sec for Q29) whereas in your measurements, (almost) all queries finish well under one sec. The affected queries are all scan/IO-heavy, i.e. they don't have selective filters (WHERE) which could be handled using indexes.
I am totally fine with merging this PR, I just like to understand what caused the difference. Is the workload somehow split between the local machine and Motherduck (i.e. some kind of hybrid execution). In that case, I guess a different local machine (e.g. more cores, faster IO, etc.) could cause this - l. 4 doesn't specify which machine you used for your measurements. And this is really just speculation but does the cloud component (Motherduck) perhaps provide more or less resources (threads, IO, etc.) based on the time of the day? (after all, the free tier is used for which such a behavior would make sense).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me investigate. This looks like a bug (that I've seen a couple of times and it went away) where performance got worse and worse over time, especially for the cold runs. I want to get a handle on it before submitting, since I don't think it is a good idea to submit numbers so far off from what you can reproduce on your own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rschu1ze Thanks for taking a look. Since it's Thanks Giving in the US, @jtigani asked me to take a look so you won't be slowed down.
Some of the difference can be explain by the location of your EC2 machine - our backends are in us-east-1 , can you share where you run your EC2 instance?
More importantly - I tried to find your run in our backends and other than you creating the
pgclick
through our UI, I couldn't find anything. This may sounds stupid, but I want to double check that you've set the environment variableMOTHERDUCK_TOKEN
before running the script? It really feels like you have by accident stored the data in postgres and not in MotherDuck - that would explain a significant slow down.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To double-check, I setup myself a c6a.4xlarge in us-east-1 and ran the benchmark again:
Results are much closer to the submitted results, so the difference seems related to the region.
Regarding
MOTHERDUCK_TOKEN
: benchmark.sh (line 22ff) checks that the variable is set. I deleted the token used in the previous run already (I think it was called "test"). For the new run, I used token "clickbench" in my MotherDuck account (mail address: [email protected]).Anyways, I think we are good ... I'll. merge. Thanks for the help.