Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-4398][IT] Add Golden Files for TPC-H Spark32 + Gluten Execution Plan #4432

Merged
merged 11 commits into from
Feb 29, 2024

Conversation

zwangsheng
Copy link
Contributor

@zwangsheng zwangsheng commented Jan 17, 2024

What changes were proposed in this pull request?

Add tpc-h golden file for spark32 and check whether PR changed Spark Execution Plan

Continue worker with #4399

  1. Add golden files for tpc-h(except delta related)
  2. for now only work for spark32, other spark version will be made later.
  3. Each PR will check those golden files though Velox Backend Unit Test, and will upload actual golden files to github action if some query mis-match.

截屏2024-02-19 16 35 31

Users can download and replace the local corresponding file for quick correction.
Golden files located at ${GLUTEN_REPO_HOME}/backends-velox/src/test/resources/tpch-approved-plan/${TPC_H_TEST_SUB_TYPE}/${SPARK_VERSION}/${QUERY_ID}.txt,
for example:
${GLUTEN_REPO_HOME}/backends-velox/src/test/resources/tpch-approved-plan/gluten-bhj-vanilla-be/spark322/1.txt.

And when user run unit test in local, them can find actual golden file in tmp dir, find this with unit test output log.

- TPC-H q5 *** FAILED ***
  Mismatch for query 5
  Actual Plan path: /tmp/tpch-approved-plan/v2-bhj/spark322/5.txt
  Golden Plan path: /opt/gluten/backends-velox/target/scala-2.12/test-classes/tpch-approved-plan/v2-bhj/spark322/5.txt (VeloxTPCHSuite.scala:101)

How was this patch tested?

Add Unit Test for Spark32 to check golden files

Copy link

#4398

Copy link

Run Gluten Clickhouse CI

6 similar comments
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@zhztheplayer
Copy link
Member

zhztheplayer commented Jan 18, 2024

@zwangsheng Thanks in advance.

Would you like to introduce a bit about the term "golden file"? Thanks.

If I understand it correctly, when execution plan for a query gets changed, developer should regenerate the files using gluten-it, is that correct?

If yes, I am worried that this way puts extra burden to developers since compiling gluten-it takes quite some time as it packages everything. Did you consider adding this check to Gluten's main code?

After all, would you like to share a bit about the background why you wanted to add this check to Gluten (and its CI)? Thanks.

@zwangsheng
Copy link
Contributor Author

Thanks for noticing this feature. @zhztheplayer

Would you like to introduce a bit about the term "golden file"? Thanks.

Base on Spark SQL Module has PlanStabilitySuite to sense how code changes (from PR) change the execution plan.

Gluten is closely related to spark execution plans, and Golden File should be introduced for gluten to ensure that code changes to the execution plan are the first to be detected, so that we can better evaluate code changes.

If I understand it correctly, when execution plan for a query gets changed, developer should regenerate the files using gluten-it, is that correct?

Yes, if devs make same change to related code, the Golden Files should be re-generate by gluten-it module

If yes, I am worried that this way puts extra burden to developers since compiling gluten-it takes quite some time as it packages everything.

At present, it seems that the process of user contribution PR will be affected, and maybe we can make some changes based on this PR to reduce the compilation cost of the user side(in the process of CI detection, we can upload the changed Golden File to the log store section of Github Action, and users can download it instead of compiling it locally, that seems to allay the concerns you mentioned)

But in general, we still should have such a Golden File mechanism in case we miss the relevant changes.

Thanks again, if have new ideas to help us check spark execution plan changes, be open to them.

@zwangsheng zwangsheng changed the title [GLUTEN-4398][IT] Add TPC-H Spark + Gluten Execution Plan Golden File [GLUTEN-4398][IT] Add Golden Files for TPC-H Spark + Gluten Execution Plan Jan 18, 2024
Copy link

Run Gluten Clickhouse CI

3 similar comments
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@zhztheplayer
Copy link
Member

zhztheplayer commented Jan 19, 2024

Thanks for the detailed explanation @zwangsheng.

At present, it seems that the process of user contribution PR will be affected, and maybe we can make some changes based on this PR to reduce the compilation cost of the user side(in the process of CI detection, we can upload the changed Golden File to the log store section of Github Action, and users can download it instead of compiling it locally, that seems to allay the concerns you mentioned)

I still feel the procedure is probably a little too complicated for developers. Gluten has in-place TPC-H unit test code here

https://github.com/oap-project/gluten/blob/main/backends-velox/src/test/scala/io/glutenproject/execution/VeloxTPCHSuite.scala

So if possible I might be inclined to have the golden checking integrated with current unit tests rather than integration testing. Then developers could avoid re-building the whole project to change golden files when plan gets changed. I understand it might require for a bunch of changes to this PR but let's see if it is worth it.

I was considering bring some gluten-it's features into unit tests but that may mess up the responsibility between ut and it. I am not sure. But currently the cost of using gluten-it to manually verify code changes is high.

A simple comparison between using it and ut:

it:
Developer change code -> Commits code change -> Run GHA CI -> Download golden files -> Replace local golden files -> Rerun GHA CI

ut (probably):
Developer change code -> Run UT to generate new golden files -> Replace local golden files -> Commit code change -> Run GHA CI

I assume the ut solution should take much shorter time than using it?

@zwangsheng zwangsheng marked this pull request as draft January 19, 2024 10:52
@zwangsheng
Copy link
Contributor Author

ut (probably):
Developer change code -> Run UT to generate new golden files -> Replace local golden files -> Commit code change -> Run GHA CI

Thanks, i will give a try to move those golden files test to unit test.

@zhztheplayer
Copy link
Member

ut (probably):
Developer change code -> Run UT to generate new golden files -> Replace local golden files -> Commit code change -> Run GHA CI

Thanks, i will give a try to move those golden files test to unit test.

Thank you!

Copy link

Run Gluten Clickhouse CI

7 similar comments
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@zwangsheng zwangsheng changed the title [GLUTEN-4398][IT] Add Golden Files for TPC-H Spark + Gluten Execution Plan [GLUTEN-4398][IT] Add Golden Files for TPC-H Spark32 + Gluten Execution Plan Feb 5, 2024
Copy link

github-actions bot commented Feb 5, 2024

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@zwangsheng zwangsheng marked this pull request as ready for review February 23, 2024 09:57
Comment on lines +19 to +23
if [ -z "$GITHUB_RUN_ID" ]
then
echo "Unable to parse GITHUB_RUN_ID."
exit 1
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check $GITHUB_JOB too?

Copy link
Member

@zhztheplayer zhztheplayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. Thanks @zwangsheng !

Copy link
Contributor

@ulysses-you ulysses-you left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems overkill to generate golden files for each type

  • gluten-bhj-vanilla-be
  • vanilla-bhj-gluten-be
  • partitioned
  • v1
  • v1-bhj
  • v2
  • v2-bhj

Can we just maintain v1 and v1-bhj ?

@zwangsheng
Copy link
Contributor Author

it seems overkill to generate golden files for each type

  • gluten-bhj-vanilla-be
  • vanilla-bhj-gluten-be
  • partitioned
  • v1
  • v1-bhj
  • v2
  • v2-bhj

Can we just maintain v1 and v1-bhj ?

Reducing type seems feasible, but considering Spark retains v2 related, should we also retain v2 related?

@ulysses-you
Copy link
Contributor

I think no body would use v2 code path to read parquet.. it is mainly used for data lake.

@zhztheplayer
Copy link
Member

Another question: After this patch, is there any best practice to re-generate all the golden files through a single command?

- TPC-H q5 *** FAILED ***
 Mismatch for query 5
 Actual Plan path: /tmp/tpch-approved-plan/v2-bhj/spark322/5.txt
 Golden Plan path: /opt/gluten/backends-velox/target/scala-2.12/test-classes/tpch-approved-plan/v2-bhj/spark322/5.txt (VeloxTPCHSuite.scala:101)

The example is good but it seems developer should fix the files one by one?

@zwangsheng
Copy link
Contributor Author

zwangsheng commented Feb 26, 2024

Another question: After this patch, is there any best practice to re-generate all the golden files through a single command?

- TPC-H q5 *** FAILED ***
 Mismatch for query 5
 Actual Plan path: /tmp/tpch-approved-plan/v2-bhj/spark322/5.txt
 Golden Plan path: /opt/gluten/backends-velox/target/scala-2.12/test-classes/tpch-approved-plan/v2-bhj/spark322/5.txt (VeloxTPCHSuite.scala:101)

The example is good but it seems developer should fix the files one by one?

Nope, github action will cache those actual golden files in tgz mode, so users can download and replace all in one click.

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@ulysses-you
Copy link
Contributor

lgtm, please add a step by step docs to help contributors check and update golden files.

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@zwangsheng
Copy link
Contributor Author

Hi @ulysses-you @zhztheplayer Golden files related introduction was added in NewToGluten.md, please help to correct it

Actual Plan path: /tmp/tpch-approved-plan/v2-bhj/spark322/5.txt
Golden Plan path: /opt/gluten/backends-velox/target/scala-2.12/test-classes/tpch-approved-plan/v2-bhj/spark322/5.txt (VeloxTPCHSuite.scala:101)
```
For developers to update the golden plan, you can find the actual plan in Github CI Artifacts or in local `/tmp/` directory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a .png to make it clear ?

Copy link

Run Gluten Clickhouse CI

Copy link
Contributor

@ulysses-you ulysses-you left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, cc @zhztheplayer @PHILO-HE if you have other comments

Copy link
Member

@zhztheplayer zhztheplayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution!

@ulysses-you ulysses-you merged commit 7af6d0c into apache:main Feb 29, 2024
19 checks passed
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_4432_time.csv log/native_master_02_27_2024_94c8e7e0e_time.csv difference percentage
q1 32.86 33.59 0.729 102.22%
q2 24.58 24.40 -0.186 99.25%
q3 37.48 38.91 1.426 103.81%
q4 36.89 36.13 -0.755 97.95%
q5 70.37 70.98 0.615 100.87%
q6 5.51 7.08 1.572 128.51%
q7 84.79 84.44 -0.347 99.59%
q8 84.63 86.39 1.753 102.07%
q9 122.91 121.48 -1.432 98.84%
q10 43.11 44.06 0.955 102.22%
q11 20.57 19.96 -0.611 97.03%
q12 29.38 29.43 0.057 100.19%
q13 44.42 45.22 0.798 101.80%
q14 19.15 16.33 -2.820 85.27%
q15 28.84 28.90 0.056 100.19%
q16 16.22 15.96 -0.252 98.44%
q17 102.65 103.10 0.455 100.44%
q18 149.35 148.22 -1.134 99.24%
q19 12.63 12.53 -0.098 99.22%
q20 26.36 26.71 0.349 101.32%
q21 227.75 224.21 -3.548 98.44%
q22 13.67 13.66 -0.006 99.96%
total 1234.13 1231.71 -2.423 99.80%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants