-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to verify expression fuzzer runs on a subset of rows #11267
Add ability to verify expression fuzzer runs on a subset of rows #11267
Conversation
This pull request was exported from Phabricator. Differential Revision: D64366745 |
✅ Deploy Preview for meta-velox canceled.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me overall. Let me write up some instructions for testing it on expression fuzzer with PQR.
RowVectorPtr reducedVector = std::dynamic_pointer_cast<RowVector>( | ||
BaseVector::create(rowVector->type(), cnt, rowVector->pool())); | ||
SelectivityVector rowsToCopy(cnt); | ||
reducedVector->copy(rowVector.get(), rowsToCopy, rawIndices); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This piece of code can be simply replaced with BaseVector::wrapInDictionary(nullptr, indices, cnt, rowVector)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, the only reason I did this was to avoid creating a top level RowVector which is dictionary encoded which we do not expect in velox. This basically reduces some steps if we did (wrapInDictionary + flattenVector) instead.
This however will only be used by the presto runner, where it will probably write this to disk first. I have not had a chance to run it with PQR yet, but if that codepath supports an encoded top level row vector then we should be good and I can replace this with your suggestion.
@@ -102,7 +102,8 @@ TEST_F(ExpressionVerifierUnitTest, persistReproInfo) { | |||
auto plan = parseExpression("always_throws(c0)", asRowType(data->type())); | |||
|
|||
removeDirecrtoryIfExist(localFs, reproPath); | |||
VELOX_ASSERT_THROW(verifier.verify({plan}, data, nullptr, false), ""); | |||
VELOX_ASSERT_THROW( | |||
verifier.verify({plan}, data, std::nullopt, nullptr, false), ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also add a test case of verify() applying only on a subset of rows? (E.g., maybe make the unselected rows throw if they are evaluated, then assert verify() doesn't throw.)
09e2315
to
730d5cc
Compare
…ebookincubator#11267) Summary: Currently, the expression fuzzer has a phase where it re-runs rows that did not throw an error to ensure evaluation is consistent for them. To achieve this, it currently wraps the inputs with a dictionary that only points to the subset of those rows. This results a change in the encoding of inputs which can cause differences in eval paths taken between phases. To address this and ensure the same paths are taken for each evaluation phase, this change introduces the ability for the expression verifier to only verify a subset of the input rows. The aforementioned fuzzer run phase can only specify the non error rows and maintain the original input row. Follow up: After this change, it would be relevant to also store the input selectivity vector. A subsequent change will be added that would add this ability and make corresponding changes to the ExpressionRunner Differential Revision: D64366745
This pull request was exported from Phabricator. Differential Revision: D64366745 |
…ebookincubator#11267) Summary: Currently, the expression fuzzer has a phase where it re-runs rows that did not throw an error to ensure evaluation is consistent for them. To achieve this, it currently wraps the inputs with a dictionary that only points to the subset of those rows. This results a change in the encoding of inputs which can cause differences in eval paths taken between phases. To address this and ensure the same paths are taken for each evaluation phase, this change introduces the ability for the expression verifier to only verify a subset of the input rows. The aforementioned fuzzer run phase can only specify the non error rows and maintain the original input row. Follow up: After this change, it would be relevant to also store the input selectivity vector. A subsequent change will be added that would add this ability and make corresponding changes to the ExpressionRunner Differential Revision: D64366745
730d5cc
to
50d636f
Compare
This pull request was exported from Phabricator. Differential Revision: D64366745 |
…ebookincubator#11267) Summary: Currently, the expression fuzzer has a phase where it re-runs rows that did not throw an error to ensure evaluation is consistent for them. To achieve this, it currently wraps the inputs with a dictionary that only points to the subset of those rows. This results a change in the encoding of inputs which can cause differences in eval paths taken between phases. To address this and ensure the same paths are taken for each evaluation phase, this change introduces the ability for the expression verifier to only verify a subset of the input rows. The aforementioned fuzzer run phase can only specify the non error rows and maintain the original input row. Follow up: After this change, it would be relevant to also store the input selectivity vector. A subsequent change will be added that would add this ability and make corresponding changes to the ExpressionRunner Differential Revision: D64366745
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I verified locally that the retry-with-try in expression fuzzer with PrestoQueryRunner works well after this change.
Thank you @kagamiori for verifying using PrestoQueryRunner. |
This pull request has been merged in 7c93eba. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
Summary:
Currently, the expression fuzzer has a phase where it re-runs rows
that did not throw an error to ensure evaluation is consistent for
them. To achieve this, it currently wraps the inputs with a dictionary
that only points to the subset of those rows. This results a change in
the encoding of inputs which can cause differences in eval paths taken
between phases. To address this and ensure the same paths are taken
for each evaluation phase, this change introduces the ability for the
expression verifier to only verify a subset of the input rows. The
aforementioned fuzzer run phase can only specify the non error rows
and maintain the original input row.
Follow up: After this change, it would be relevant to also store the
input selectivity vector. A subsequent change will be added that would
add this ability and make corresponding changes to the ExpressionRunner
Differential Revision: D64366745