Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Fix potential double unload of JNI libraries on Java 11+ #6146

Closed
wants to merge 1 commit into from

Conversation

yangzhg
Copy link
Member

@yangzhg yangzhg commented Jun 19, 2024

What changes were proposed in this pull request?

Previously, the addHookForLibUnloading method added a shutdown hook unconditionally to unload JNI libraries. This could potentially lead to double unload issues on Java 11 and later versions where the JVM automatically handles JNI library cleanup.

This commit modifies addHookForLibUnloading to conditionally add the unload hook based on the JVM version:

  • For Java 11 and later, the method adds a no-op hook since the JVM handles the cleanup automatically.
  • For Java versions prior to 11, the method adds the provided hook to ensure proper cleanup.

This change ensures compatibility and prevents potential double unload issues on Java 11 and later.

Fixes: #6145

Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Copy link

Run Gluten Clickhouse CI

@zhztheplayer
Copy link
Member

Which library Velox's lib loader tries to unload? I remember that option (boolean requireUnload) was only used by CH backend.

@yangzhg yangzhg changed the title Fix potential double unload of JNI libraries on Java 11+ [VL] Fix potential double unload of JNI libraries on Java 11+ Jun 19, 2024
@yangzhg
Copy link
Member Author

yangzhg commented Jun 19, 2024

Which library Velox's lib loader tries to unload? I remember that option (boolean requireUnload) was only used by CH backend.

I'm not sure about which libs tries to unload, but I found after Java 11 there is no need to unload JNI libs, and the unload may cause core dump or UBs.

@zhztheplayer
Copy link
Member

Which library Velox's lib loader tries to unload? I remember that option (boolean requireUnload) was only used by CH backend.

I'm not sure about which libs tries to unload, but I found after Java 11 there is no need to unload JNI libs, and the unload may cause core dump or UBs.

Then I suppose this is not related to #6145 ? I see Fixes: #6145 in PR description.

@zzcclp Regarding lib-unloading, why CH backend needs to unload a library?

@zzcclp
Copy link
Contributor

zzcclp commented Jun 19, 2024

requireUnload

There are some resources need to be released when shutting jvm, otherwise ch will core dump

@yangzhg
Copy link
Member Author

yangzhg commented Jun 20, 2024

Then I suppose this is not related to #6145 ? I see Fixes: #6145 in PR description.

apply this patch the core dump is fixed

@yangzhg
Copy link
Member Author

yangzhg commented Jun 20, 2024

requireUnload

There are some resources need to be released when shutting jvm, otherwise ch will core dump

I will try to fix only in veloxbackend

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@yangzhg
Copy link
Member Author

yangzhg commented Jun 20, 2024

@zzcclp @zhztheplayer I have made it only worked on Velox backend

@lwz9103
Copy link
Contributor

lwz9103 commented Jun 20, 2024

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Previously, the addHookForLibUnloading method added a shutdown hook
unconditionally to unload JNI libraries. This could potentially
lead to double unload issues on Java 11 and later versions where
the JVM automatically handles JNI library cleanup.

This commit modifies addHookForLibUnloading to conditionally add
the unload hook based on the JVM version:
- For Java 11 and later, the method adds a no-op hook since the JVM
  handles the cleanup automatically.
- For Java versions prior to 11, the method adds the provided hook
  to ensure proper cleanup.

This change ensures compatibility and prevents potential double
unload issues on Java 11 and later.
Copy link

Run Gluten Clickhouse CI

Copy link
Member

@zhztheplayer zhztheplayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. But let's find another way rather than checking backend type in core module.

ShutdownHookManager.addShutdownHook(ShutdownHookManager.SPARK_CONTEXT_SHUTDOWN_PRIORITY)(hook)
ShutdownHookManager.addShutdownHook(ShutdownHookManager.SPARK_CONTEXT_SHUTDOWN_PRIORITY) {
if (
BackendsApiManager.getBackendName == "velox" && SystemUtils.isJavaVersionAtLeast(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (BackendsApiManager.getBackendName == "...")

Seems like a writing we should avoid in core module's code. Let's figure out if there is a better way.

@zhztheplayer
Copy link
Member

@yangzhg I could not reproduce the unloading error since the static storage REQUIRE_UNLOAD_LIBRARY_PATHS looks to be always empty in Velox backend thus the shutdown hook should ideally do nothing.

image

For different build configurations I assume this list should either be empty since we always pass requireUnload=false when using JniLibLoader in Velox backend. Maybe I miss something?

Would you like to collect some log or debug info to see which library is being unloaded in your case? Let's find out the root cause prior to doing any code changes. Thank you very much.

Copy link

github-actions bot commented Aug 9, 2024

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the stale stale label Aug 9, 2024
Copy link

This PR was auto-closed because it has been stalled for 10 days with no activity. Please feel free to reopen if it is still valid. Thanks.

@github-actions github-actions bot closed this Aug 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[VL] Gluten may core dump when running on JVM 17
4 participants