Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Decouple ShuffleWriter and PartitionWriter #3932

Closed
wants to merge 4 commits into from

Conversation

marin-ma
Copy link
Contributor

@marin-ma marin-ma commented Dec 5, 2023

In the previous implementation, The life cycle of PartitionWriter is managed by ShuffleWriter. However, PartitionWriter also holds the pointer to ShuffleWriter and manipulates its internal state, which is considered as bad design. This PR removes the raw pointer of ShuffleWriter in PartitionWrtier.

Detailed changes:

  1. In the previous implementation, during ShuffleWriter::Stop, it will call PartitionWriter::Stop to merge spilled files and flush cached payloads, and calls back ShuffleWriter to write in-memory split buffers. After this change, ShuffleWrtier will transfer the ownership of split buffers to PartitionWriter at the beginning of ShuffleWriter::Stop, so the callback can be removed.
  2. PartitionWriter will update ShuffleWriter metrics. This PR adds ShuffleWriterMetrics and PartitionWriter::populateMetrics to update metrics data only with ShuffleWriterMetrics instance.

With above changes, the raw pointer of ShuffleWriter can be removed in PartitionWriter.

Copy link

github-actions bot commented Dec 5, 2023

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

@marin-ma marin-ma force-pushed the decouple-shuffle-writer branch from eca7962 to e7430aa Compare December 5, 2023 14:25
@marin-ma marin-ma marked this pull request as ready for review December 6, 2023 12:42
@FelixYBW
Copy link
Contributor

FelixYBW commented Dec 8, 2023

ready to review?

@marin-ma
Copy link
Contributor Author

marin-ma commented Dec 8, 2023

Reference

@FelixYBW Yes. This PR is ready for review.

@marin-ma
Copy link
Contributor Author

marin-ma commented Dec 8, 2023

/Benchmark Velox

@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_3932_time.csv log/native_master_12_07_2023_9c5314fd5_time.csv difference percentage
q1 33.46 33.97 0.508 101.52%
q2 23.57 24.95 1.372 105.82%
q3 38.92 37.91 -1.006 97.41%
q4 37.88 37.39 -0.484 98.72%
q5 74.96 71.30 -3.657 95.12%
q6 7.70 5.38 -2.321 69.86%
q7 88.12 85.81 -2.306 97.38%
q8 88.48 85.80 -2.681 96.97%
q9 126.91 123.51 -3.404 97.32%
q10 45.60 45.87 0.270 100.59%
q11 20.76 20.29 -0.471 97.73%
q12 25.71 26.73 1.019 103.96%
q13 46.24 45.73 -0.510 98.90%
q14 16.68 16.69 0.010 100.06%
q15 28.37 28.46 0.089 100.31%
q16 16.76 15.56 -1.205 92.81%
q17 105.56 103.37 -2.185 97.93%
q18 154.80 149.01 -5.786 96.26%
q19 12.91 12.84 -0.062 99.52%
q20 28.43 27.78 -0.650 97.71%
q21 225.88 223.73 -2.151 99.05%
q22 13.40 13.39 -0.007 99.95%
total 1261.09 1235.47 -25.618 97.97%

@marin-ma marin-ma force-pushed the decouple-shuffle-writer branch from 3c42026 to a7c9e01 Compare December 8, 2023 14:29
@marin-ma
Copy link
Contributor Author

marin-ma commented Dec 8, 2023

/Benchmark Velox

@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_3932_time.csv log/native_master_12_07_2023_9c5314fd5_time.csv difference percentage
q1 33.34 33.97 0.629 101.89%
q2 25.35 24.95 -0.409 98.39%
q3 39.71 37.91 -1.800 95.47%
q4 37.68 37.39 -0.289 99.23%
q5 74.00 71.30 -2.694 96.36%
q6 7.04 5.38 -1.663 76.39%
q7 88.14 85.81 -2.325 97.36%
q8 88.93 85.80 -3.128 96.48%
q9 125.74 123.51 -2.231 98.23%
q10 43.91 45.87 1.957 104.46%
q11 20.01 20.29 0.282 101.41%
q12 27.43 26.73 -0.708 97.42%
q13 46.37 45.73 -0.640 98.62%
q14 14.71 16.69 1.985 113.50%
q15 30.18 28.46 -1.715 94.32%
q16 15.78 15.56 -0.229 98.55%
q17 104.75 103.37 -1.381 98.68%
q18 149.88 149.01 -0.863 99.42%
q19 14.57 12.84 -1.731 88.12%
q20 28.96 27.78 -1.186 95.91%
q21 225.25 223.73 -1.518 99.33%
q22 13.40 13.39 -0.011 99.92%
total 1255.13 1235.47 -19.666 98.43%

@marin-ma
Copy link
Contributor Author

/Benchmark Velox

@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_3932_time.csv log/native_master_12_10_2023_e9ea4e3e4_time.csv difference percentage
q1 34.28 34.68 0.401 101.17%
q2 25.27 27.19 1.921 107.60%
q3 39.63 37.58 -2.057 94.81%
q4 37.59 37.98 0.393 101.05%
q5 74.70 70.50 -4.205 94.37%
q6 5.44 6.93 1.487 127.34%
q7 88.15 84.23 -3.924 95.55%
q8 88.97 85.45 -3.513 96.05%
q9 130.38 126.22 -4.156 96.81%
q10 46.45 43.28 -3.177 93.16%
q11 20.73 19.94 -0.793 96.17%
q12 24.34 27.48 3.134 112.87%
q13 47.12 47.58 0.465 100.99%
q14 15.89 16.51 0.617 103.88%
q15 29.05 27.43 -1.616 94.44%
q16 15.71 15.68 -0.031 99.80%
q17 105.73 104.55 -1.177 98.89%
q18 154.02 149.27 -4.750 96.92%
q19 12.95 13.60 0.648 105.00%
q20 30.22 27.18 -3.043 89.93%
q21 226.77 225.87 -0.903 99.60%
q22 13.36 13.10 -0.259 98.06%
total 1266.75 1242.22 -24.537 98.06%

@marin-ma
Copy link
Contributor Author

Resolved by #4099

@marin-ma marin-ma closed this Jan 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants