-
Notifications
You must be signed in to change notification settings - Fork 459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-6849][VL] Call static initializers once in Spark local mode / when session is renewed #6855
Merged
zhztheplayer
merged 20 commits into
apache:main
from
zhztheplayer:wip-init-once-local-mode
Aug 19, 2024
Merged
Changes from 19 commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
d6008f6
nits
zhztheplayer 173a482
nits
zhztheplayer 721323b
fixup
zhztheplayer 5cbca36
fixup
zhztheplayer 863486e
fixup
zhztheplayer 7966d4b
fixup
zhztheplayer da800f8
fixup
zhztheplayer 2eb1b70
fixup
zhztheplayer cbd1eb7
fixup
zhztheplayer c8e518b
fixup
zhztheplayer badef79
fixup
zhztheplayer 64066b4
fixup
zhztheplayer a058504
fixup
zhztheplayer 1d86f3a
fixup
zhztheplayer e4a53dd
fixup
zhztheplayer f1db25d
fixup
zhztheplayer 5b159a0
fixup
zhztheplayer 4096a53
fixup
zhztheplayer 17f6232
fixup
zhztheplayer 19d3313
fixup
zhztheplayer File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,7 +18,6 @@ package org.apache.gluten.backendsapi.velox | |
|
||
import org.apache.gluten.GlutenConfig | ||
import org.apache.gluten.backendsapi.ListenerApi | ||
import org.apache.gluten.exception.GlutenException | ||
import org.apache.gluten.execution.datasource.{GlutenOrcWriterInjects, GlutenParquetWriterInjects, GlutenRowSplitter} | ||
import org.apache.gluten.expression.UDFMappings | ||
import org.apache.gluten.init.NativeBackendInitializer | ||
|
@@ -27,138 +26,73 @@ import org.apache.gluten.vectorized.{JniLibLoader, JniWorkspace} | |
|
||
import org.apache.spark.{SparkConf, SparkContext} | ||
import org.apache.spark.api.plugin.PluginContext | ||
import org.apache.spark.internal.Logging | ||
import org.apache.spark.sql.execution.datasources.velox.{VeloxOrcWriterInjects, VeloxParquetWriterInjects, VeloxRowSplitter} | ||
import org.apache.spark.sql.expression.UDFResolver | ||
import org.apache.spark.sql.internal.{GlutenConfigUtil, StaticSQLConf} | ||
import org.apache.spark.util.SparkDirectoryUtil | ||
import org.apache.spark.util.{SparkDirectoryUtil, SparkResourceUtil} | ||
|
||
import org.apache.commons.lang3.StringUtils | ||
|
||
import scala.sys.process._ | ||
import java.util.concurrent.atomic.AtomicBoolean | ||
|
||
class VeloxListenerApi extends ListenerApi { | ||
private val ARROW_VERSION = "1500" | ||
class VeloxListenerApi extends ListenerApi with Logging { | ||
import VeloxListenerApi._ | ||
|
||
override def onDriverStart(sc: SparkContext, pc: PluginContext): Unit = { | ||
if (!driverInitialized.compareAndSet(false, true)) { | ||
// Make sure we call the static initializers only once. | ||
logInfo( | ||
"Skip rerunning static initializers since they are only supposed to run once." + | ||
" You see this message probably because you are creating a new SparkSession.") | ||
return | ||
} | ||
|
||
// Static initializers for driver. | ||
val conf = pc.conf() | ||
// sql table cache serializer | ||
// Sql table cache serializer. | ||
if (conf.getBoolean(GlutenConfig.COLUMNAR_TABLE_CACHE_ENABLED.key, defaultValue = false)) { | ||
conf.set( | ||
StaticSQLConf.SPARK_CACHE_SERIALIZER.key, | ||
"org.apache.spark.sql.execution.ColumnarCachedBatchSerializer") | ||
} | ||
initialize(conf, isDriver = true) | ||
SparkDirectoryUtil.init(conf) | ||
UDFResolver.resolveUdfConf(conf, isDriver = true) | ||
initialize(conf) | ||
} | ||
|
||
override def onDriverShutdown(): Unit = shutdown() | ||
|
||
override def onExecutorStart(pc: PluginContext): Unit = { | ||
initialize(pc.conf(), isDriver = false) | ||
} | ||
|
||
override def onExecutorShutdown(): Unit = shutdown() | ||
|
||
private def getLibraryLoaderForOS( | ||
systemName: String, | ||
systemVersion: String, | ||
system: String): SharedLibraryLoader = { | ||
if (systemName.contains("Ubuntu") && systemVersion.startsWith("20.04")) { | ||
new SharedLibraryLoaderUbuntu2004 | ||
} else if (systemName.contains("Ubuntu") && systemVersion.startsWith("22.04")) { | ||
new SharedLibraryLoaderUbuntu2204 | ||
} else if (systemName.contains("CentOS") && systemVersion.startsWith("9")) { | ||
new SharedLibraryLoaderCentos9 | ||
} else if (systemName.contains("CentOS") && systemVersion.startsWith("8")) { | ||
new SharedLibraryLoaderCentos8 | ||
} else if (systemName.contains("CentOS") && systemVersion.startsWith("7")) { | ||
new SharedLibraryLoaderCentos7 | ||
} else if (systemName.contains("Alibaba Cloud Linux") && systemVersion.startsWith("3")) { | ||
new SharedLibraryLoaderCentos8 | ||
} else if (systemName.contains("Alibaba Cloud Linux") && systemVersion.startsWith("2")) { | ||
new SharedLibraryLoaderCentos7 | ||
} else if (systemName.contains("Anolis") && systemVersion.startsWith("8")) { | ||
new SharedLibraryLoaderCentos8 | ||
} else if (systemName.contains("Anolis") && systemVersion.startsWith("7")) { | ||
new SharedLibraryLoaderCentos7 | ||
} else if (system.contains("tencentos") && system.contains("2.4")) { | ||
new SharedLibraryLoaderCentos7 | ||
} else if (system.contains("tencentos") && system.contains("3.2")) { | ||
new SharedLibraryLoaderCentos8 | ||
} else if (systemName.contains("Red Hat") && systemVersion.startsWith("9")) { | ||
new SharedLibraryLoaderCentos9 | ||
} else if (systemName.contains("Red Hat") && systemVersion.startsWith("8")) { | ||
new SharedLibraryLoaderCentos8 | ||
} else if (systemName.contains("Red Hat") && systemVersion.startsWith("7")) { | ||
new SharedLibraryLoaderCentos7 | ||
} else if (systemName.contains("Debian") && systemVersion.startsWith("11")) { | ||
new SharedLibraryLoaderDebian11 | ||
} else if (systemName.contains("Debian") && systemVersion.startsWith("12")) { | ||
new SharedLibraryLoaderDebian12 | ||
} else { | ||
throw new GlutenException( | ||
s"Found unsupported OS($systemName, $systemVersion)! Currently, Gluten's Velox backend" + | ||
" only supports Ubuntu 20.04/22.04, CentOS 7/8, " + | ||
"Alibaba Cloud Linux 2/3 & Anolis 7/8, tencentos 2.4/3.2, RedHat 7/8, " + | ||
"Debian 11/12.") | ||
if (!executorInitialized.compareAndSet(false, true)) { | ||
// Make sure we call the static initializers only once. | ||
logInfo( | ||
"Skip rerunning static initializers since they are only supposed to run once." + | ||
" You see this message probably because you are creating a new SparkSession.") | ||
return | ||
} | ||
} | ||
|
||
private def loadLibFromJar(load: JniLibLoader, conf: SparkConf): Unit = { | ||
val systemName = conf.getOption(GlutenConfig.GLUTEN_LOAD_LIB_OS) | ||
val loader = if (systemName.isDefined) { | ||
val systemVersion = conf.getOption(GlutenConfig.GLUTEN_LOAD_LIB_OS_VERSION) | ||
if (systemVersion.isEmpty) { | ||
throw new GlutenException( | ||
s"${GlutenConfig.GLUTEN_LOAD_LIB_OS_VERSION} must be specified when specifies the " + | ||
s"${GlutenConfig.GLUTEN_LOAD_LIB_OS}") | ||
} | ||
getLibraryLoaderForOS(systemName.get, systemVersion.get, "") | ||
} else { | ||
val system = "cat /etc/os-release".!! | ||
val systemNamePattern = "^NAME=\"?(.*)\"?".r | ||
val systemVersionPattern = "^VERSION=\"?(.*)\"?".r | ||
val systemInfoLines = system.stripMargin.split("\n") | ||
val systemNamePattern(systemName) = | ||
systemInfoLines.find(_.startsWith("NAME=")).getOrElse("") | ||
val systemVersionPattern(systemVersion) = | ||
systemInfoLines.find(_.startsWith("VERSION=")).getOrElse("") | ||
if (systemName.isEmpty || systemVersion.isEmpty) { | ||
throw new GlutenException("Failed to get OS name and version info.") | ||
} | ||
getLibraryLoaderForOS(systemName, systemVersion, system) | ||
val conf = pc.conf | ||
if (inLocalMode(conf)) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we can directly call There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am OK to both... Keeping it would shorten the calling code a little bit |
||
// Don't do static initializations from executor side in local mode. | ||
// Driver already did that. | ||
logInfo( | ||
"Gluten is running with Spark local mode. Skip running static initializer for executor.") | ||
return | ||
} | ||
loader.loadLib(load) | ||
} | ||
|
||
private def loadLibWithLinux(conf: SparkConf, loader: JniLibLoader): Unit = { | ||
if ( | ||
conf.getBoolean( | ||
GlutenConfig.GLUTEN_LOAD_LIB_FROM_JAR, | ||
GlutenConfig.GLUTEN_LOAD_LIB_FROM_JAR_DEFAULT) | ||
) { | ||
loadLibFromJar(loader, conf) | ||
} | ||
// Static initializers for executor. | ||
SparkDirectoryUtil.init(conf) | ||
UDFResolver.resolveUdfConf(conf, isDriver = false) | ||
initialize(conf) | ||
} | ||
|
||
private def loadLibWithMacOS(loader: JniLibLoader): Unit = { | ||
// Placeholder for loading shared libs on MacOS if user needs. | ||
} | ||
override def onExecutorShutdown(): Unit = shutdown() | ||
|
||
private def initialize(conf: SparkConf, isDriver: Boolean): Unit = { | ||
SparkDirectoryUtil.init(conf) | ||
UDFResolver.resolveUdfConf(conf, isDriver = isDriver) | ||
private def initialize(conf: SparkConf): Unit = { | ||
if (conf.getBoolean(GlutenConfig.GLUTEN_DEBUG_KEEP_JNI_WORKSPACE, defaultValue = false)) { | ||
val debugDir = conf.get(GlutenConfig.GLUTEN_DEBUG_KEEP_JNI_WORKSPACE_DIR) | ||
JniWorkspace.enableDebug(debugDir) | ||
} | ||
val loader = JniWorkspace.getDefault.libLoader | ||
|
||
val osName = System.getProperty("os.name") | ||
if (osName.startsWith("Mac OS X") || osName.startsWith("macOS")) { | ||
loadLibWithMacOS(loader) | ||
} else { | ||
loadLibWithLinux(conf, loader) | ||
} | ||
|
||
// Set the system properties. | ||
// Use appending policy for children with the same name in a arrow struct vector. | ||
|
@@ -167,6 +101,13 @@ class VeloxListenerApi extends ListenerApi { | |
// Load supported hive/python/scala udfs | ||
UDFMappings.loadFromSparkConf(conf) | ||
|
||
// Initial library loader. | ||
val loader = JniWorkspace.getDefault.libLoader | ||
|
||
// Load shared native libraries the backend libraries depend on. | ||
SharedLibraryLoader.load(conf, loader) | ||
|
||
// Load backend libraries. | ||
val libPath = conf.get(GlutenConfig.GLUTEN_LIB_PATH, StringUtils.EMPTY) | ||
if (StringUtils.isNotBlank(libPath)) { // Path based load. Ignore all other loadees. | ||
JniLibLoader.loadFromPath(libPath, false) | ||
|
@@ -176,11 +117,11 @@ class VeloxListenerApi extends ListenerApi { | |
loader.mapAndLoad(VeloxBackend.BACKEND_NAME, false) | ||
} | ||
|
||
// Initial native backend with configurations. | ||
val parsed = GlutenConfigUtil.parseConfig(conf.getAll.toMap) | ||
NativeBackendInitializer.initializeBackend(parsed) | ||
|
||
// inject backend-specific implementations to override spark classes | ||
// FIXME: The following set instances twice in local mode? | ||
// Inject backend-specific implementations to override spark classes. | ||
GlutenParquetWriterInjects.setInstance(new VeloxParquetWriterInjects()) | ||
GlutenOrcWriterInjects.setInstance(new VeloxOrcWriterInjects()) | ||
GlutenRowSplitter.setInstance(new VeloxRowSplitter()) | ||
|
@@ -191,4 +132,13 @@ class VeloxListenerApi extends ListenerApi { | |
} | ||
} | ||
|
||
object VeloxListenerApi {} | ||
object VeloxListenerApi { | ||
// TODO: Implement graceful shutdown and remove these flags. | ||
// As spark conf may change when active Spark session is recreated. | ||
private val driverInitialized: AtomicBoolean = new AtomicBoolean(false) | ||
private val executorInitialized: AtomicBoolean = new AtomicBoolean(false) | ||
|
||
private def inLocalMode(conf: SparkConf): Boolean = { | ||
jackylee-ch marked this conversation as resolved.
Show resolved
Hide resolved
|
||
SparkResourceUtil.isLocalMaster(conf) | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t quite understand that
onDriverStart
will be called multiple times. Can you explain in detail?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It happens in Spark local mode. Spark creates one driver and one executor in that mode, in the current process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand this. It should be once for
onDriverStart
and once foronExecutorStart
, but notonDriverStart
is called twice.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, got it wrong.
onDriverStart
will be called twice when spark session is recreated.onExecutorStart
may be called twice when dynamic allocation is enabled, I am not sure about this one.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give an example of how spark session is recreated, cloneSession or other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About
onExecutorStart
, I think it will not called twice because dynamic allocation add new executor is a new jvm.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please refer to
SparkSession.stop
orSparkContext.stop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If just re-create sparkSession, it will not restart the driver. Re-creating sparkContext will restart the driver, but a new sparkConf may be set, so is it better to re-initialize it once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, finally we should remove the flags and do re-initializations. See my comment and the issue #6862