You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the DAG, when I observe the "shuffle write time total" metric, I found it was much bigger than I expected. So I dive deep into the gluten code and found that the writeTime_ was added twice into the final metric by writeMetrics.incWriteTime.
In the VeloxCelebornHashBasedColumnarShuffleWriter.scala file, write time was calculated as the sum of splitResult.getTotalWriteTime + splitResult.getTotalPushTime. And the totalWriteTime is accumulated here by this line . The totalPushTime is accumulated here by the spillTime_ variable. And it's obvious that the spillTime_ includes writeTime_ which means writeTime_ was added twice in the final write time metric.
In order to fix it, I propose moving the ScopedTimer line a few lines down.
Let me know if you want me to open a PR. Thanks.
Spark version
Spark-3.2.x
Spark configurations
No response
System information
No response
Relevant logs
No response
The text was updated successfully, but these errors were encountered:
Hi, I think Payload#writeTime_ is actually counted in RssPartitionWriter::stop at this line. And writeTime_ is introduced by line 20 "#include "shuffle/Payload.h"" . So it would be counted twice.
Backend
VL (Velox)
Bug description
In the DAG, when I observe the "shuffle write time total" metric, I found it was much bigger than I expected. So I dive deep into the gluten code and found that the writeTime_ was added twice into the final metric by writeMetrics.incWriteTime.

In the VeloxCelebornHashBasedColumnarShuffleWriter.scala file, write time was calculated as the sum of splitResult.getTotalWriteTime + splitResult.getTotalPushTime. And the totalWriteTime is accumulated here by this line . The totalPushTime is accumulated here by the spillTime_ variable. And it's obvious that the spillTime_ includes writeTime_ which means writeTime_ was added twice in the final write time metric.
In order to fix it, I propose moving the ScopedTimer line a few lines down.

Let me know if you want me to open a PR. Thanks.
Spark version
Spark-3.2.x
Spark configurations
No response
System information
No response
Relevant logs
No response
The text was updated successfully, but these errors were encountered: