Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-6253] Use internal udf config to avoid modify the original one #6255

Merged
merged 1 commit into from
Jun 30, 2024

Conversation

marin-ma
Copy link
Contributor

@marin-ma marin-ma commented Jun 27, 2024

The current implementation sets spark.gluten.sql.columnar.backend.velox.udfLibraryPaths on driver side after resolving the library paths. This approach can overwrite the original settings with a local file path on the driver node before sending the SparkConf to all executors, and the executors on different nodes will fail while accessing that path.

This PR sets the resolved library paths to an internal config to avoid the conflicts.

Manually verified on a multi-node cluster.

Copy link

#6253

@marin-ma
Copy link
Contributor Author

@kecookier Could you help to review? Thanks!

@kecookier
Copy link
Contributor

Hi @marin-ma, I have some concerns. In yarn-client mode, we may set an extra configuration --conf spark.gluten.sql.columnar.backend.velox.driver.udfLibraryPaths=file:///path/to/libmyudf.so to let the driver read the local UDF library. The complete configuration is as follows:

--files /path/to/gluten/cpp/build/velox/udf/examples/libmyudf.so
--conf spark.gluten.sql.columnar.backend.velox.udfLibraryPaths=libmyudf.so
# Needed for Yarn client mode
--conf spark.gluten.sql.columnar.backend.velox.driver.udfLibraryPaths=file:///path/to/libmyudf.so

It seems like spark.gluten.sql.columnar.backend.velox.internal.udfLibraryPaths has the same meaning. Could you please clarify this?

@marin-ma
Copy link
Contributor Author

@kecookier That's the exact configuration I used in #6253

spark.gluten.sql.columnar.backend.velox.udfLibraryPaths and spark.gluten.sql.columnar.backend.velox.driver.udfLibraryPaths are the configurations exposed to users. spark.gluten.sql.columnar.backend.velox.internal.udfLibraryPaths is used as an internal configuration to save the resolved udf library path and is used on native side for loading the libraries. https://github.com/apache/incubator-gluten/pull/6255/files#diff-a633836e086157189ba6a590a79c4823e73937f896ecab711a324586e13aa73aL102-R102

@kecookier
Copy link
Contributor

Thanks for your clarification, it looks good to me.

@FelixYBW FelixYBW merged commit 0b34e8e into apache:main Jun 30, 2024
42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants