-
Notifications
You must be signed in to change notification settings - Fork 459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-6849][VL] Call static initializers once in Spark local mode / when session is renewed #6855
[GLUTEN-6849][VL] Call static initializers once in Spark local mode / when session is renewed #6855
Conversation
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
1 similar comment
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
backends-velox/src/main/scala/org/apache/gluten/backendsapi/velox/VeloxListenerApi.scala
Show resolved
Hide resolved
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
} | ||
getLibraryLoaderForOS(systemName, systemVersion, system) | ||
val conf = pc.conf | ||
if (inLocalMode(conf)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can directly call SparkResourceUtil.isLocalMaster
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am OK to both... Keeping it would shorten the calling code a little bit
if (!driverInitialized.compareAndSet(false, true)) { | ||
// Make sure we call the static initializers only once. | ||
logInfo( | ||
"Skip rerunning static initializers since they are only supposed to run once." + | ||
" You see this message probably because you are creating a new SparkSession.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t quite understand that onDriverStart
will be called multiple times. Can you explain in detail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It happens in Spark local mode. Spark creates one driver and one executor in that mode, in the current process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand this. It should be once for onDriverStart
and once for onExecutorStart
, but not onDriverStart
is called twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, got it wrong.
onDriverStart
will be called twice when spark session is recreated.
onExecutorStart
may be called twice when dynamic allocation is enabled, I am not sure about this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give an example of how spark session is recreated, cloneSession or other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About onExecutorStart
, I think it will not called twice because dynamic allocation add new executor is a new jvm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give an example of how spark session is recreated, cloneSession or other?
Please refer to SparkSession.stop
or SparkContext.stop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If just re-create sparkSession, it will not restart the driver. Re-creating sparkContext will restart the driver, but a new sparkConf may be set, so is it better to re-initialize it once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, finally we should remove the flags and do re-initializations. See my comment and the issue #6862
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Run Gluten Clickhouse CI |
… when session is renewed (apache#6855)
In Spark local mode, Spark driver and executor are both calling ListenerApi to run static initializers. This could cause UBs in some cases since the initializers were not deigned to be called more than once. This patch fixes the issue by adding a check
inLocalMode
then use it to skip executor side initializer when it's true.The patch also avoids rerunning the static initializers when Spark session is recreated.
And with essential code cleanups.
Closes #6849