Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to verify expression fuzzer runs on a subset of rows #11267

Closed

Conversation

bikramSingh91
Copy link
Contributor

Summary:
Currently, the expression fuzzer has a phase where it re-runs rows
that did not throw an error to ensure evaluation is consistent for
them. To achieve this, it currently wraps the inputs with a dictionary
that only points to the subset of those rows. This results a change in
the encoding of inputs which can cause differences in eval paths taken
between phases. To address this and ensure the same paths are taken
for each evaluation phase, this change introduces the ability for the
expression verifier to only verify a subset of the input rows. The
aforementioned fuzzer run phase can only specify the non error rows
and maintain the original input row.

Follow up: After this change, it would be relevant to also store the
input selectivity vector. A subsequent change will be added that would
add this ability and make corresponding changes to the ExpressionRunner

Differential Revision: D64366745

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 15, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64366745

Copy link

netlify bot commented Oct 15, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 50d636f
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6721570f18d3220008fd3a8c

Copy link
Contributor

@kagamiori kagamiori left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me overall. Let me write up some instructions for testing it on expression fuzzer with PQR.

Comment on lines +86 to +91
RowVectorPtr reducedVector = std::dynamic_pointer_cast<RowVector>(
BaseVector::create(rowVector->type(), cnt, rowVector->pool()));
SelectivityVector rowsToCopy(cnt);
reducedVector->copy(rowVector.get(), rowsToCopy, rawIndices);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This piece of code can be simply replaced with BaseVector::wrapInDictionary(nullptr, indices, cnt, rowVector)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, the only reason I did this was to avoid creating a top level RowVector which is dictionary encoded which we do not expect in velox. This basically reduces some steps if we did (wrapInDictionary + flattenVector) instead.

This however will only be used by the presto runner, where it will probably write this to disk first. I have not had a chance to run it with PQR yet, but if that codepath supports an encoded top level row vector then we should be good and I can replace this with your suggestion.

@@ -102,7 +102,8 @@ TEST_F(ExpressionVerifierUnitTest, persistReproInfo) {
auto plan = parseExpression("always_throws(c0)", asRowType(data->type()));

removeDirecrtoryIfExist(localFs, reproPath);
VELOX_ASSERT_THROW(verifier.verify({plan}, data, nullptr, false), "");
VELOX_ASSERT_THROW(
verifier.verify({plan}, data, std::nullopt, nullptr, false), "");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add a test case of verify() applying only on a subset of rows? (E.g., maybe make the unselected rows throw if they are evaluated, then assert verify() doesn't throw.)

bikramSingh91 pushed a commit to bikramSingh91/velox that referenced this pull request Oct 28, 2024
…ebookincubator#11267)

Summary:

Currently, the expression fuzzer has a phase where it re-runs rows
that did not throw an error to ensure evaluation is consistent for
them. To achieve this, it currently wraps the inputs with a dictionary
that only points to the subset of those rows. This results a change in
the encoding of inputs which can cause differences in eval paths taken
between phases. To address this and ensure the same paths are taken
for each evaluation phase, this change introduces the ability for the
expression verifier to only verify a subset of the input rows. The
aforementioned fuzzer run phase can only specify the non error rows
and maintain the original input row.

Follow up: After this change, it would be relevant to also store the
input selectivity vector. A subsequent change will be added that would
add this ability and make corresponding changes to the ExpressionRunner

Differential Revision: D64366745
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64366745

…ebookincubator#11267)

Summary:

Currently, the expression fuzzer has a phase where it re-runs rows
that did not throw an error to ensure evaluation is consistent for
them. To achieve this, it currently wraps the inputs with a dictionary
that only points to the subset of those rows. This results a change in
the encoding of inputs which can cause differences in eval paths taken
between phases. To address this and ensure the same paths are taken
for each evaluation phase, this change introduces the ability for the
expression verifier to only verify a subset of the input rows. The
aforementioned fuzzer run phase can only specify the non error rows
and maintain the original input row.

Follow up: After this change, it would be relevant to also store the
input selectivity vector. A subsequent change will be added that would
add this ability and make corresponding changes to the ExpressionRunner

Differential Revision: D64366745
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64366745

bikramSingh91 pushed a commit to bikramSingh91/velox that referenced this pull request Oct 29, 2024
…ebookincubator#11267)

Summary:

Currently, the expression fuzzer has a phase where it re-runs rows
that did not throw an error to ensure evaluation is consistent for
them. To achieve this, it currently wraps the inputs with a dictionary
that only points to the subset of those rows. This results a change in
the encoding of inputs which can cause differences in eval paths taken
between phases. To address this and ensure the same paths are taken
for each evaluation phase, this change introduces the ability for the
expression verifier to only verify a subset of the input rows. The
aforementioned fuzzer run phase can only specify the non error rows
and maintain the original input row.

Follow up: After this change, it would be relevant to also store the
input selectivity vector. A subsequent change will be added that would
add this ability and make corresponding changes to the ExpressionRunner

Differential Revision: D64366745
Copy link
Contributor

@kagamiori kagamiori left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I verified locally that the retry-with-try in expression fuzzer with PrestoQueryRunner works well after this change.

@bikramSingh91
Copy link
Contributor Author

LGTM. I verified locally that the retry-with-try in expression fuzzer with PrestoQueryRunner works well after this change.

Thank you @kagamiori for verifying using PrestoQueryRunner.

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 7c93eba.

Copy link

Conbench analyzed the 1 benchmark run on commit 7c93ebad.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants