Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup AggregationFuzzer to use Presto as source of truth in experimental jobs. #8314

Closed

Conversation

kgpai
Copy link
Contributor

@kgpai kgpai commented Jan 9, 2024

What ?
Recently support for PrestoQueryRunner was added to AggregationFuzzer, so that it can use Presto as a source of truth. This PR sets up an experimental job that uses the Presto Java container to run Aggregation Fuzzer with Presto as source of truth.

Why ?
Running the aggregation fuzzer against Presto gives us greater confidence on the correctness of our aggregate functions.

How ?

  1. Previously we created a docker image with support for java and Presto
  2. This still requires some changes to get Presto to work with AggregationFuzzer. These changes are listed below.
  3. Adds support for Hive metastore - we do this via a file based metastore by adding the hive.properties file to $PRESTO_HOME/etc/catalog (see Setting up Presto to use as source of truth for Fuzzers #8111)
  4. We also set the pid file on the launcher to ensure there is only one run of the Presto server
  5. Another thing to note is that we set the timezone to be 'Americas/Los_Angeles' due to details described here .

Testing:

There is no automated way to test this. Here are manual steps I follow:

  1. Create presto-java-container by running docker-compose build presto-java
  2. Ssh into the presto-java-container
  3. Copy Velox sources into the presto-java-container and build Velox
  4. Launch Presto service: /opt/start-prestojava.sh
  5. Run Aggregation fuzzer with --presto_url=http://127.0.0.1:8080 flag

Example Run: Here is an example run https://github.com/facebookincubator/velox/actions/runs/7508139395/job/20443035814

Copy link

netlify bot commented Jan 9, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 93eec92
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/65a6e80e2c1c0a0008ea497a

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 9, 2024
@kgpai kgpai requested review from assignUser and mbasmanova January 9, 2024 23:33
@kgpai
Copy link
Contributor Author

kgpai commented Jan 9, 2024

Testing this out ..

@kgpai kgpai force-pushed the presto_java_hive_metaserver branch 2 times, most recently from 29e9354 to b9c4a0a Compare January 10, 2024 18:55
@kgpai
Copy link
Contributor Author

kgpai commented Jan 10, 2024

local testing:

I0110 19:40:18.059751 26733 AggregationFuzzer.cpp:1006] Total functions tested: 8
I0110 19:40:18.059777 26733 AggregationFuzzer.cpp:1007] Total masked aggregations: 1 (10.00%)
I0110 19:40:18.059818 26733 AggregationFuzzer.cpp:1009] Total global aggregations: 0 (0.00%)
I0110 19:40:18.059832 26733 AggregationFuzzer.cpp:1011] Total group-by aggregations: 10 (100.00%)
I0110 19:40:18.059859 26733 AggregationFuzzer.cpp:1013] Total distinct aggregations: 0 (0.00%)
I0110 19:40:18.059876 26733 AggregationFuzzer.cpp:1015] Total aggregations over sorted inputs: 0 (0.00%)
I0110 19:40:18.059903 26733 AggregationFuzzer.cpp:1017] Total window expressions: 0 (0.00%)
I0110 19:40:18.059917 26733 AggregationFuzzer.cpp:1019] Total aggregations verified against reference DB: 1 (10.00%)
I0110 19:40:18.059934 26733 AggregationFuzzer.cpp:1021] Total aggregations not verified (non-deterministic function / not supported by reference DB / reference DB failed): 6 (60.00%) / 0 (0.00%) / 2 (20.00%)
I0110 19:40:18.059971 26733 AggregationFuzzer.cpp:1026] Total failed aggregations: 1 (10.00%)

@kgpai kgpai force-pushed the presto_java_hive_metaserver branch 3 times, most recently from 71530a6 to 266fe2f Compare January 11, 2024 00:26
@kgpai kgpai force-pushed the presto_java_hive_metaserver branch from 266fe2f to 57d8781 Compare January 11, 2024 05:46
Copy link
Contributor

@kagamiori kagamiori left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Great to see AggregationFuzzer with PQR enabled! By the way, don't we need to add a jvm.config file as mentioned in #8111?

@kgpai
Copy link
Contributor Author

kgpai commented Jan 11, 2024

@kagamiori That was already added in a previous PR.

@facebook-github-bot
Copy link
Contributor

@kgpai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@kgpai kgpai force-pushed the presto_java_hive_metaserver branch 3 times, most recently from 0d4314e to e43acc2 Compare January 16, 2024 19:51
@facebook-github-bot
Copy link
Contributor

@kgpai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@kgpai kgpai force-pushed the presto_java_hive_metaserver branch from e43acc2 to 93eec92 Compare January 16, 2024 20:33
@facebook-github-bot
Copy link
Contributor

@kgpai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@kgpai merged this pull request in 1bf01e5.

Copy link

Conbench analyzed the 1 benchmark run on commit 1bf01e58.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants