Add ability to verify expression fuzzer runs on a subset of rows #11267

bikramSingh91 · 2024-10-15T21:56:01Z

Summary:
Currently, the expression fuzzer has a phase where it re-runs rows
that did not throw an error to ensure evaluation is consistent for
them. To achieve this, it currently wraps the inputs with a dictionary
that only points to the subset of those rows. This results a change in
the encoding of inputs which can cause differences in eval paths taken
between phases. To address this and ensure the same paths are taken
for each evaluation phase, this change introduces the ability for the
expression verifier to only verify a subset of the input rows. The
aforementioned fuzzer run phase can only specify the non error rows
and maintain the original input row.

Follow up: After this change, it would be relevant to also store the
input selectivity vector. A subsequent change will be added that would
add this ability and make corresponding changes to the ExpressionRunner

Differential Revision: D64366745

facebook-github-bot · 2024-10-15T21:56:10Z

This pull request was exported from Phabricator. Differential Revision: D64366745

netlify · 2024-10-15T21:56:16Z

✅ Deploy Preview for meta-velox canceled.

Name	Link
🔨 Latest commit	`50d636f`
🔍 Latest deploy log	https://app.netlify.com/sites/meta-velox/deploys/6721570f18d3220008fd3a8c

kagamiori

Looks good to me overall. Let me write up some instructions for testing it on expression fuzzer with PQR.

kagamiori · 2024-10-15T23:27:42Z

velox/expression/tests/ExpressionVerifier.cpp

+  RowVectorPtr reducedVector = std::dynamic_pointer_cast<RowVector>(
+      BaseVector::create(rowVector->type(), cnt, rowVector->pool()));
+  SelectivityVector rowsToCopy(cnt);
+  reducedVector->copy(rowVector.get(), rowsToCopy, rawIndices);


nit: This piece of code can be simply replaced with BaseVector::wrapInDictionary(nullptr, indices, cnt, rowVector)?

True, the only reason I did this was to avoid creating a top level RowVector which is dictionary encoded which we do not expect in velox. This basically reduces some steps if we did (wrapInDictionary + flattenVector) instead.

This however will only be used by the presto runner, where it will probably write this to disk first. I have not had a chance to run it with PQR yet, but if that codepath supports an encoded top level row vector then we should be good and I can replace this with your suggestion.

kagamiori · 2024-10-15T23:29:40Z

velox/expression/tests/ExpressionVerifierUnitTest.cpp

@@ -102,7 +102,8 @@ TEST_F(ExpressionVerifierUnitTest, persistReproInfo) {
    auto plan = parseExpression("always_throws(c0)", asRowType(data->type()));

    removeDirecrtoryIfExist(localFs, reproPath);
-    VELOX_ASSERT_THROW(verifier.verify({plan}, data, nullptr, false), "");
+    VELOX_ASSERT_THROW(
+        verifier.verify({plan}, data, std::nullopt, nullptr, false), "");


Could you also add a test case of verify() applying only on a subset of rows? (E.g., maybe make the unselected rows throw if they are evaluated, then assert verify() doesn't throw.)

…ebookincubator#11267) Summary: Currently, the expression fuzzer has a phase where it re-runs rows that did not throw an error to ensure evaluation is consistent for them. To achieve this, it currently wraps the inputs with a dictionary that only points to the subset of those rows. This results a change in the encoding of inputs which can cause differences in eval paths taken between phases. To address this and ensure the same paths are taken for each evaluation phase, this change introduces the ability for the expression verifier to only verify a subset of the input rows. The aforementioned fuzzer run phase can only specify the non error rows and maintain the original input row. Follow up: After this change, it would be relevant to also store the input selectivity vector. A subsequent change will be added that would add this ability and make corresponding changes to the ExpressionRunner Differential Revision: D64366745

facebook-github-bot · 2024-10-28T23:46:35Z

This pull request was exported from Phabricator. Differential Revision: D64366745

…ebookincubator#11267) Summary: Currently, the expression fuzzer has a phase where it re-runs rows that did not throw an error to ensure evaluation is consistent for them. To achieve this, it currently wraps the inputs with a dictionary that only points to the subset of those rows. This results a change in the encoding of inputs which can cause differences in eval paths taken between phases. To address this and ensure the same paths are taken for each evaluation phase, this change introduces the ability for the expression verifier to only verify a subset of the input rows. The aforementioned fuzzer run phase can only specify the non error rows and maintain the original input row. Follow up: After this change, it would be relevant to also store the input selectivity vector. A subsequent change will be added that would add this ability and make corresponding changes to the ExpressionRunner Differential Revision: D64366745

facebook-github-bot · 2024-10-29T21:43:54Z

This pull request was exported from Phabricator. Differential Revision: D64366745

…ebookincubator#11267) Summary: Currently, the expression fuzzer has a phase where it re-runs rows that did not throw an error to ensure evaluation is consistent for them. To achieve this, it currently wraps the inputs with a dictionary that only points to the subset of those rows. This results a change in the encoding of inputs which can cause differences in eval paths taken between phases. To address this and ensure the same paths are taken for each evaluation phase, this change introduces the ability for the expression verifier to only verify a subset of the input rows. The aforementioned fuzzer run phase can only specify the non error rows and maintain the original input row. Follow up: After this change, it would be relevant to also store the input selectivity vector. A subsequent change will be added that would add this ability and make corresponding changes to the ExpressionRunner Differential Revision: D64366745

kagamiori

LGTM. I verified locally that the retry-with-try in expression fuzzer with PrestoQueryRunner works well after this change.

bikramSingh91 · 2024-10-30T17:25:34Z

LGTM. I verified locally that the retry-with-try in expression fuzzer with PrestoQueryRunner works well after this change.

Thank you @kagamiori for verifying using PrestoQueryRunner.

facebook-github-bot · 2024-10-30T22:16:25Z

This pull request has been merged in 7c93eba.

conbench-facebook · 2024-10-30T22:48:27Z

Conbench analyzed the 1 benchmark run on commit 7c93ebad.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 15, 2024

facebook-github-bot added the fb-exported label Oct 15, 2024

kagamiori reviewed Oct 15, 2024

View reviewed changes

bikramSingh91 force-pushed the export-D64366745 branch from 09e2315 to 730d5cc Compare October 28, 2024 23:46

bikramSingh91 force-pushed the export-D64366745 branch from 730d5cc to 50d636f Compare October 29, 2024 21:43

kagamiori approved these changes Oct 30, 2024

View reviewed changes

facebook-github-bot closed this in 7c93eba Oct 30, 2024

facebook-github-bot added the Merged label Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to verify expression fuzzer runs on a subset of rows #11267

Add ability to verify expression fuzzer runs on a subset of rows #11267

bikramSingh91 commented Oct 15, 2024

facebook-github-bot commented Oct 15, 2024

netlify bot commented Oct 15, 2024 •

edited

Loading

kagamiori left a comment

kagamiori Oct 15, 2024

bikramSingh91 Oct 26, 2024

kagamiori Oct 15, 2024

facebook-github-bot commented Oct 28, 2024

facebook-github-bot commented Oct 29, 2024

kagamiori left a comment

bikramSingh91 commented Oct 30, 2024

facebook-github-bot commented Oct 30, 2024

conbench-facebook bot commented Oct 30, 2024

Add ability to verify expression fuzzer runs on a subset of rows #11267

Add ability to verify expression fuzzer runs on a subset of rows #11267

Conversation

bikramSingh91 commented Oct 15, 2024

facebook-github-bot commented Oct 15, 2024

netlify bot commented Oct 15, 2024 • edited Loading

✅ Deploy Preview for meta-velox canceled.

kagamiori left a comment

Choose a reason for hiding this comment

kagamiori Oct 15, 2024

Choose a reason for hiding this comment

bikramSingh91 Oct 26, 2024

Choose a reason for hiding this comment

kagamiori Oct 15, 2024

Choose a reason for hiding this comment

facebook-github-bot commented Oct 28, 2024

facebook-github-bot commented Oct 29, 2024

kagamiori left a comment

Choose a reason for hiding this comment

bikramSingh91 commented Oct 30, 2024

facebook-github-bot commented Oct 30, 2024

conbench-facebook bot commented Oct 30, 2024

netlify bot commented Oct 15, 2024 •

edited

Loading