-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add fuzzer for RowNumber operator (#9524)
Summary: The RowNumberFuzzer is a testing tool that automatically generates equivalent query plans and then execute these plans to validate the consistency of the results. It works as follows: 1. Data Generation: It starts by generating a random set of input data, also known as a vector. This data can have a variety of encodings and data layouts to ensure thorough testing. 2. Plan Generation: Generate two equivalent query plans, one is row-number over ValuesNode and the other is over TableScanNode. 3. Query Execution: Executes those equivalent query plans using the generated data and asserts that the results are consistent across different plans. 4. Iteration: This process is repeated multiple times to ensure reliability and robustness. Pull Request resolved: #9524 Reviewed By: kevinwilfong Differential Revision: D56767074 Pulled By: xiaoxmeng fbshipit-source-id: 644b9a341e5273cb37efbdf0bd8254edcc54c974
- Loading branch information
1 parent
f54787b
commit 38abde9
Showing
12 changed files
with
891 additions
and
7 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
================ | ||
RowNumber Fuzzer | ||
================ | ||
|
||
The RowNumberFuzzer is a testing tool that automatically generate equivalent query plans and then executes these plans | ||
to validate the consistency of the results. It works as follows: | ||
|
||
1. Data Generation: It starts by generating a random set of input data, also known as a vector. This data can | ||
have a variety of encodings and data layouts to ensure thorough testing. | ||
2. Plan Generation: Generate two equivalent query plans, one is row-number over ValuesNode as the base plan. | ||
and the other is over TableScanNode as the alter plan. | ||
3. Query Execution: Executes those equivalent query plans using the generated data and asserts that the results are | ||
consistent across different plans. | ||
i. Execute the base plan, compare the result with the reference (DuckDB or Presto) and use it as the expected result. | ||
#. Execute the alter plan multiple times with and without spill, and compare each result with the | ||
expected result. | ||
4. Iteration: This process is repeated multiple times to ensure reliability and robustness. | ||
|
||
How to run | ||
---------- | ||
|
||
Use velox_row_number_fuzzer_test binary to run rowNumber fuzzer: | ||
|
||
:: | ||
|
||
velox/exec/tests/velox_row_number_fuzzer_test --seed 123 --duration_sec 60 | ||
|
||
By default, the fuzzer will go through 10 iterations. Use --steps | ||
or --duration-sec flag to run fuzzer for longer. Use --seed to | ||
reproduce fuzzer failures. | ||
|
||
Here is a full list of supported command line arguments. | ||
|
||
* ``–-steps``: How many iterations to run. Each iteration generates and | ||
evaluates one expression or aggregation. Default is 10. | ||
|
||
* ``–-duration_sec``: For how long to run in seconds. If both ``-–steps`` | ||
and ``-–duration_sec`` are specified, –duration_sec takes precedence. | ||
|
||
* ``–-seed``: The seed to generate random expressions and input vectors with. | ||
|
||
* ``–-v=1``: Verbose logging (from `Google Logging Library <https://github.com/google/glog#setting-flags>`_). | ||
|
||
* ``–-batch_size``: The size of input vectors to generate. Default is 100. | ||
|
||
* ``--num_batches``: The number of input vectors of size `--batch_size` to | ||
generate. Default is 5. | ||
|
||
* ``--enable_spill``: Whether to test with spilling or not. Default is true. | ||
|
||
* ``--presto_url`` The PrestoQueryRunner url along with its port number. | ||
|
||
* ``--req_timeout_ms`` Timeout in milliseconds of an HTTP request to the PrestoQueryRunner. | ||
|
||
If running from CLion IDE, add ``--logtostderr=1`` to see the full output. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.