Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-6849][VL] Call static initializers once in Spark local mode / when session is renewed #6855

Merged
merged 20 commits into from
Aug 19, 2024

Conversation

zhztheplayer
Copy link
Member

@zhztheplayer zhztheplayer commented Aug 15, 2024

In Spark local mode, Spark driver and executor are both calling ListenerApi to run static initializers. This could cause UBs in some cases since the initializers were not deigned to be called more than once. This patch fixes the issue by adding a check inLocalMode then use it to skip executor side initializer when it's true.

The patch also avoids rerunning the static initializers when Spark session is recreated.

And with essential code cleanups.

Closes #6849

@github-actions github-actions bot added CORE works for Gluten Core VELOX RSS labels Aug 15, 2024
Copy link

Run Gluten Clickhouse CI

@apache apache deleted a comment from github-actions bot Aug 15, 2024
@zhztheplayer zhztheplayer changed the title [VL] Call static initializers once from Driver side in Spark local mode [VL] Call static initializers once in Spark local mode Aug 15, 2024
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

1 similar comment
Copy link

Run Gluten Clickhouse CI

@zhztheplayer zhztheplayer changed the title [VL] Call static initializers once in Spark local mode [VL] Call static initializers once in Spark local mode / when session is renewed Aug 15, 2024
Copy link

Run Gluten Clickhouse CI

@zhztheplayer zhztheplayer changed the title [VL] Call static initializers once in Spark local mode / when session is renewed [GLUTEN-6849][VL] Call static initializers once in Spark local mode / when session is renewed Aug 15, 2024
Copy link

#6849

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

}
getLibraryLoaderForOS(systemName, systemVersion, system)
val conf = pc.conf
if (inLocalMode(conf)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can directly call SparkResourceUtil.isLocalMaster?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am OK to both... Keeping it would shorten the calling code a little bit

Comment on lines +43 to +47
if (!driverInitialized.compareAndSet(false, true)) {
// Make sure we call the static initializers only once.
logInfo(
"Skip rerunning static initializers since they are only supposed to run once." +
" You see this message probably because you are creating a new SparkSession.")
Copy link
Contributor

@zml1206 zml1206 Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t quite understand that onDriverStart will be called multiple times. Can you explain in detail?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It happens in Spark local mode. Spark creates one driver and one executor in that mode, in the current process.

Copy link
Contributor

@zml1206 zml1206 Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this. It should be once for onDriverStart and once for onExecutorStart, but not onDriverStart is called twice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, got it wrong.

onDriverStart will be called twice when spark session is recreated.
onExecutorStart may be called twice when dynamic allocation is enabled, I am not sure about this one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give an example of how spark session is recreated, cloneSession or other?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About onExecutorStart, I think it will not called twice because dynamic allocation add new executor is a new jvm.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give an example of how spark session is recreated, cloneSession or other?

Please refer to SparkSession.stop or SparkContext.stop

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If just re-create sparkSession, it will not restart the driver. Re-creating sparkContext will restart the driver, but a new sparkConf may be set, so is it better to re-initialize it once?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, finally we should remove the flags and do re-initializations. See my comment and the issue #6862

@github-actions github-actions bot removed the RSS label Aug 15, 2024
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@github-actions github-actions bot added the RSS label Aug 16, 2024
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

PHILO-HE
PHILO-HE previously approved these changes Aug 16, 2024
Copy link
Contributor

@PHILO-HE PHILO-HE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link

Run Gluten Clickhouse CI

@zhztheplayer zhztheplayer merged commit 37dae7d into apache:main Aug 19, 2024
45 checks passed
sharkdtu pushed a commit to sharkdtu/gluten that referenced this pull request Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CORE works for Gluten Core RSS VELOX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[VL] Move process-wise static initializers out from VeloxListenerApi
4 participants