-
Notifications
You must be signed in to change notification settings - Fork 458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VL] Move ColumnarBuildSideRelation
's memory occupation to Spark off-heap
#7750
Comments
Do you mean we can allocate the batch in Spark's offheap memory? if so it's a good solution. |
Correct, I assume it could be straightforward to simply change the allocation to off-heap. It's not likely similar with Gazelle's case which used to be tricky. |
Hi, I would like to do this ticket to learn more about Gluten. Could you please point me to where an off-heap allocated container is used in the code? Thank you. |
Thank you in advance for helping @Zand100 . You can refer to vanilla Spark's code UnsafeHashedRelation where So I can assign this ticket to you, I suppose? |
@Zand100 You may consider to upstream the PR to upstream spark as well. It's another solution about offheap/onheap conflict, that we move all Spark's large memory allocation to offheap once offheap is enabled. |
Thank you! Just to check I'm on the right track, I'm trying to use the Yes, you can assign the ticket to me. |
Hi, how should I handle constructing a
If |
Could you please review the draft #7885? |
Not necessarily to use |
Thank you! Could you please review the draft of the new binary container? #7902 |
Could you please review #7944 ? Should |
Move ColumnarBuildSideRelation's memory occupation to Spark off-heapWe are very glad to see the discussion here. In our production environment, we have also been troubled by the out-of-memory (OOM) problem caused by the broadcast build relation using the heap memory for a long time. We adopted a similar approach as proposed by @zhztheplayer and made more optimizations (such as dividing large batches into small batches) in our production scenario. Current gluten implement
Proposed designIt went through two rounds of iterative development in our inner environment. Round1: using unsafe offheap to store broadcast batches on executor
Additionally, another problem emerges where a certain batch in Round2: serialize more small batched to construct
|
Thank you for the comprehensive design @zjuwangg !. Let's move forward with an initial patch. Would you like to place the content in a Google doc in which everyone can comment? Thanks. You can send a mail including the Google doc link to [email protected]. |
Initial patch can be founded here #8127, and I also draft a design google doc https://docs.google.com/document/d/1eZNWPUEdiz2JPJfhyVn9hrk6SqJFRNzOMZm6u5Yredk/edit?tab=t.0#heading=h.1wu7kc4pvnqd. Eager to hear your thoughts and opinions! |
So far
ColumnarBuildSideRelation
is allocated on Spark JVM heap memory.It appears that we can replace
batches: Array[Array[Byte]]
with an off-heap allocated container to move the memory usage to off-heap.There should be a simple solution that doesn't require too much of refactor.(see #8127 (comment) about the edit)This could avoid some of the heap OOM issues.
The text was updated successfully, but these errors were encountered: