Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MotherDuck-enabled pg_duckdb results. #272

Merged
merged 8 commits into from
Nov 29, 2024
Merged

Conversation

jtigani
Copy link
Contributor

@jtigani jtigani commented Nov 26, 2024

Thank You for Your Contribution!

We appreciate your effort and contribution to the project. To ensure that your Pull Request (PR) adheres to our guidelines, please ensure to review the rules mentioned in our contribution guidelines:

ClickHouse/ClickBench Contribution Rules

Thank you for your attention to these details and for helping us maintain the quality and integrity of the project.

@rschu1ze rschu1ze self-assigned this Nov 26, 2024
pg_duckdb-motherduck/benchmark.sh Outdated Show resolved Hide resolved
pg_duckdb-motherduck/benchmark.sh Show resolved Hide resolved
@jtigani

This comment was marked as resolved.

@@ -0,0 +1,57 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran benchmark.sh on my local EC2 c6a.4xlarge machine and got these numbers:

[0.310832,0.135982,0.137973],
[0.312424,0.145676,0.14565],
[0.367049,0.16804,0.168023],
[1.74935,0.171891,0.171829],
[2.29114,0.523813,0.520013],
[2.46478,0.777383,0.77676],
[2.00442,1.99079,2.01413],
[0.303728,0.147151,0.147396],
[2.38651,0.605923,0.606097],
[2.90069,0.794022,0.801774],
[1.75951,0.256617,0.260653],
[2.21079,0.285738,0.287225],
[2.74292,0.699291,0.696846],
[5.23709,1.02618,1.03795],
[2.6506,0.745238,0.746417],
[1.82356,0.585969,0.590522],
[5.35414,1.38093,1.42019],
[5.23847,1.32079,1.32186],
[9.49408,6.62594,6.65811],
[1.27376,0.17786,0.174985],
[20.2344,1.9744,1.94168],
[23.1244,1.78669,1.7944],
[44.4286,3.60808,3.55152],
[112.15,9.57462,9.56068],
[5.88666,1.09089,1.09729],
[2.51667,0.43618,0.439071],
[6.04578,1.09779,1.08893],
[19.925,1.7998,1.83581],
[17.4306,11.204,11.2168],
[0.548479,0.509858,0.518488],
[5.32779,0.772682,0.77066],
[12.1051,0.878097,0.871254],
[11.2436,6.14124,6.19371],
[21.214,5.52945,5.42852],
[21.1378,5.38582,5.53328],
[1.19671,0.657055,0.665458],
[0.407409,0.323923,0.322465],
[0.459891,0.279551,0.275829],
[0.458786,0.219163,0.218257],
[0.752248,0.472317,0.482665],
[0.357792,0.161885,0.161701],
[0.33603,0.146883,0.146818],
[0.566736,0.384091,0.384248],

Some of the hot runs (2nd + 3rd measurement) on my machine took > 1 sec (the maximum was >11 sec for Q29) whereas in your measurements, (almost) all queries finish well under one sec. The affected queries are all scan/IO-heavy, i.e. they don't have selective filters (WHERE) which could be handled using indexes.

I am totally fine with merging this PR, I just like to understand what caused the difference. Is the workload somehow split between the local machine and Motherduck (i.e. some kind of hybrid execution). In that case, I guess a different local machine (e.g. more cores, faster IO, etc.) could cause this - l. 4 doesn't specify which machine you used for your measurements. And this is really just speculation but does the cloud component (Motherduck) perhaps provide more or less resources (threads, IO, etc.) based on the time of the day? (after all, the free tier is used for which such a behavior would make sense).

Copy link
Contributor Author

@jtigani jtigani Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me investigate. This looks like a bug (that I've seen a couple of times and it went away) where performance got worse and worse over time, especially for the cold runs. I want to get a handle on it before submitting, since I don't think it is a good idea to submit numbers so far off from what you can reproduce on your own.

Copy link

@bleskes bleskes Nov 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rschu1ze Thanks for taking a look. Since it's Thanks Giving in the US, @jtigani asked me to take a look so you won't be slowed down.

Some of the difference can be explain by the location of your EC2 machine - our backends are in us-east-1 , can you share where you run your EC2 instance?

More importantly - I tried to find your run in our backends and other than you creating the pgclick through our UI, I couldn't find anything. This may sounds stupid, but I want to double check that you've set the environment variable MOTHERDUCK_TOKEN before running the script? It really feels like you have by accident stored the data in postgres and not in MotherDuck - that would explain a significant slow down.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To double-check, I setup myself a c6a.4xlarge in us-east-1 and ran the benchmark again:

[0.137005,0.018994,0.012908],
[0.140741,0.018078,0.018248],
[0.134784,0.021106,0.017415],
[0.137873,0.024561,0.021223],
[0.274166,0.150139,0.149283],
[0.288274,0.166713,0.16712],
[0.131532,0.012273,0.013218],
[0.15448,0.016664,0.01604],
[0.311416,0.176908,0.17688],
[0.359286,0.237682,0.234269],
[0.183045,0.059172,0.058513],
[0.189613,0.067289,0.067073],
[0.302605,0.154747,0.155818],
[0.430244,0.301195,0.301791],
[0.282851,0.164777,0.166042],
[0.307493,0.181078,0.181029],
[0.465965,0.35104,0.350157],
[0.439346,0.310988,0.312933],
[0.712783,0.586545,0.588642],
[0.136413,0.014163,0.014281],
[0.404964,0.278645,0.281366],
[0.309573,0.197004,0.1906],
[0.415774,0.293224,0.308011],
[1.25393,1.17285,1.11026],
[0.195441,0.062959,0.060395],
[0.238035,0.060369,0.065979],
[0.192903,0.077515,0.077995],
[0.488857,0.360594,0.339036],
[1.50738,1.25838,1.39167],
[0.839093,0.729768,0.728274],
[0.262534,0.184269,0.185243],
[0.316918,0.19793,0.20941],
[0.895483,0.765861,0.728146],
[0.889801,0.7775,0.766686],
[0.904547,0.787961,0.768901],
[0.360276,0.24354,0.246335],
[0.193414,0.042963,0.043233],
[0.132383,0.021634,0.021998],
[0.15432,0.027969,0.028874],
[0.215762,0.087406,0.08749],
[0.136966,0.01455,0.014039],
[0.133496,0.013182,0.013417],
[0.136268,0.022607,0.016307],

Results are much closer to the submitted results, so the difference seems related to the region.

Regarding MOTHERDUCK_TOKEN: benchmark.sh (line 22ff) checks that the variable is set. I deleted the token used in the previous run already (I think it was called "test"). For the new run, I used token "clickbench" in my MotherDuck account (mail address: [email protected]).

Anyways, I think we are good ... I'll. merge. Thanks for the help.

pg_duckdb-motherduck/benchmark.sh Show resolved Hide resolved
@rschu1ze rschu1ze merged commit 3d2f593 into ClickHouse:main Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants