-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Introduce kudo shuffle format. #2496
Comments
3 tasks
@liurenjie1024 please add details to this. As-is, it's just a headline with nothing else to go on. |
3 tasks
Fixed. |
rapids-bot bot
pushed a commit
to rapidsai/cudf
that referenced
this issue
Oct 30, 2024
This is the first pr of [a larger one](NVIDIA/spark-rapids-jni#2532) to introduce a new serialization format. It make `ai.rapids.cudf.HostMemoryBuffer#copyFromStream` public. For more background, see NVIDIA/spark-rapids-jni#2496 Authors: - Renjie Liu (https://github.com/liurenjie1024) - Jason Lowe (https://github.com/jlowe) Approvers: - Jason Lowe (https://github.com/jlowe) - Alessandro Bellina (https://github.com/abellina) URL: #17179
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Design
Kudo serialization format is optimized for columnar batch serialization used during spark shuffle, which significantly improved serialization/deserialization time compared to jcudf serialization format. The improvements are based on two observations:
Performance
We have observed 30%-4000% serialization time improvement, up to 200% deserialization time improvement, and similar concat batching performance.
The text was updated successfully, but these errors were encountered: