-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Skipping slicing on shuffle arrays in shuffle reader #189
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we have set shuffle batch size in shuffle writer, we don't need to slice shuffle arrays at shuffle reader now.
Do you mean here?
Yea. Native shuffle writer also set batch size too. |
In that particular place, the batch size is configurable through |
Although it is a separate config, I think it should not be too far from the batch size. Actually I'm thinking if we can just use the batch size as the shuffle batch size to simplify them. At least, I don't think we will set shuffle batch size larger than batch size (it doesn't make sense to me for the original purpose of the config). For a shuffle batch size smaller than batch size, there is even no reason to do slicing. |
Can we at least add some notes for that configuration? there is nothing stop the config from being set to a larger value than the batch size at the moment. |
Okay. That is better. |
Added a note to |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #189 +/- ##
============================================
+ Coverage 33.29% 33.30% +0.01%
- Complexity 766 767 +1
============================================
Files 107 107
Lines 35385 35372 -13
Branches 7658 7657 -1
============================================
+ Hits 11781 11782 +1
+ Misses 21157 21144 -13
+ Partials 2447 2446 -1 ☔ View full report in Codecov by Sentry. |
Merged. Thanks. |
Which issue does this PR close?
Closes #.
Rationale for this change
Because we have set shuffle batch size in shuffle writer, we don't need to slice shuffle arrays at shuffle reader now.
What changes are included in this PR?
How are these changes tested?