Skip to content

Commit

Permalink
[SPARK-33212][BUILD] Upgrade to Hadoop 3.2.2 and move to shaded clien…
Browse files Browse the repository at this point in the history
…ts for Hadoop 3.x profile

### What changes were proposed in this pull request?

This:
1. switches Spark to use shaded Hadoop clients, namely hadoop-client-api and hadoop-client-runtime, for Hadoop 3.x.
2. upgrade built-in version for Hadoop 3.x to Hadoop 3.2.2

Note that for Hadoop 2.7, we'll still use the same modules such as hadoop-client.

In order to still keep default Hadoop profile to be hadoop-3.2, this defines the following Maven properties:

```
hadoop-client-api.artifact
hadoop-client-runtime.artifact
hadoop-client-minicluster.artifact
```

which default to:
```
hadoop-client-api
hadoop-client-runtime
hadoop-client-minicluster
```
but all switch to `hadoop-client` when the Hadoop profile is hadoop-2.7. A side affect from this is we'll import the same dependency multiple times. For this I have to disable Maven enforcer `banDuplicatePomDependencyVersions`.

Besides above, there are the following changes:
- explicitly add a few dependencies which are imported via transitive dependencies from Hadoop jars, but are removed from the shaded client jars.
- removed the use of `ProxyUriUtils.getPath` from `ApplicationMaster` which is a server-side/private API.
- modified `IsolatedClientLoader` to exclude `hadoop-auth` jars when Hadoop version is 3.x. This change should only matter when we're not sharing Hadoop classes with Spark (which is _mostly_ used in tests).

### Why are the changes needed?

Hadoop 3.2.2 is released with new features and bug fixes, so it's good for the Spark community to adopt it. However, latest Hadoop versions starting from Hadoop 3.2.1 have upgraded to use Guava 27+. In order to resolve Guava conflicts, this takes the approach by switching to shaded client jars provided by Hadoop. This also has the benefits of avoid pulling other 3rd party dependencies from Hadoop side so as to avoid more potential future conflicts.

### Does this PR introduce _any_ user-facing change?

When people use Spark with `hadoop-provided` option, they should make sure class path contains `hadoop-client-api` and `hadoop-client-runtime` jars. In addition, they may need to make sure these jars appear before other Hadoop jars in the order. Otherwise, classes may be loaded from the other non-shaded Hadoop jars and cause potential conflicts.

### How was this patch tested?

Relying on existing tests.

Closes apache#30701 from sunchao/test-hadoop-3.2.2.

Authored-by: Chao Sun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
  • Loading branch information
sunchao authored and dongjoon-hyun committed Jan 15, 2021
1 parent a235c3b commit b6f46ca
Show file tree
Hide file tree
Showing 18 changed files with 191 additions and 109 deletions.
8 changes: 7 additions & 1 deletion common/network-yarn/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,13 @@
<!-- Provided dependencies -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<artifactId>${hadoop-client-api.artifact}</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>${hadoop-client-runtime.artifact}</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
Expand Down
16 changes: 15 additions & 1 deletion core/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,13 @@
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<artifactId>${hadoop-client-api.artifact}</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>${hadoop-client-runtime.artifact}</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
Expand Down Expand Up @@ -177,6 +183,14 @@
<groupId>org.apache.commons</groupId>
<artifactId>commons-text</artifactId>
</dependency>
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
</dependency>
<dependency>
<groupId>commons-collections</groupId>
<artifactId>commons-collections</artifactId>
</dependency>
<dependency>
<groupId>com.google.code.findbugs</groupId>
<artifactId>jsr305</artifactId>
Expand Down
3 changes: 1 addition & 2 deletions dev/deps/spark-deps-hadoop-2.7-hive-2.3
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@ javassist/3.25.0-GA//javassist-3.25.0-GA.jar
javax.inject/1//javax.inject-1.jar
javax.jdo/3.2.0-m3//javax.jdo-3.2.0-m3.jar
javolution/5.5.1//javolution-5.5.1.jar
jaxb-api/2.2.2//jaxb-api-2.2.2.jar
jaxb-api/2.2.11//jaxb-api-2.2.11.jar
jaxb-runtime/2.3.2//jaxb-runtime-2.3.2.jar
jcl-over-slf4j/1.7.30//jcl-over-slf4j-1.7.30.jar
jdo-api/3.0.1//jdo-api-3.0.1.jar
Expand Down Expand Up @@ -227,7 +227,6 @@ spire-macros_2.12/0.17.0-M1//spire-macros_2.12-0.17.0-M1.jar
spire-platform_2.12/0.17.0-M1//spire-platform_2.12-0.17.0-M1.jar
spire-util_2.12/0.17.0-M1//spire-util_2.12-0.17.0-M1.jar
spire_2.12/0.17.0-M1//spire_2.12-0.17.0-M1.jar
stax-api/1.0-2//stax-api-1.0-2.jar
stax-api/1.0.1//stax-api-1.0.1.jar
stream/2.9.6//stream-2.9.6.jar
super-csv/2.2.0//super-csv-2.2.0.jar
Expand Down
53 changes: 2 additions & 51 deletions dev/deps/spark-deps-hadoop-3.2-hive-2.3
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,13 @@ JLargeArrays/1.5//JLargeArrays-1.5.jar
JTransforms/3.1//JTransforms-3.1.jar
RoaringBitmap/0.9.0//RoaringBitmap-0.9.0.jar
ST4/4.0.4//ST4-4.0.4.jar
accessors-smart/1.2//accessors-smart-1.2.jar
activation/1.1.1//activation-1.1.1.jar
aircompressor/0.16//aircompressor-0.16.jar
algebra_2.12/2.0.0-M2//algebra_2.12-2.0.0-M2.jar
annotations/17.0.0//annotations-17.0.0.jar
antlr-runtime/3.5.2//antlr-runtime-3.5.2.jar
antlr4-runtime/4.8-1//antlr4-runtime-4.8-1.jar
aopalliance-repackaged/2.6.1//aopalliance-repackaged-2.6.1.jar
aopalliance/1.0//aopalliance-1.0.jar
arpack_combined_all/0.1//arpack_combined_all-0.1.jar
arrow-format/2.0.0//arrow-format-2.0.0.jar
arrow-memory-core/2.0.0//arrow-memory-core-2.0.0.jar
Expand All @@ -28,15 +26,12 @@ breeze_2.12/1.0//breeze_2.12-1.0.jar
cats-kernel_2.12/2.0.0-M4//cats-kernel_2.12-2.0.0-M4.jar
chill-java/0.9.5//chill-java-0.9.5.jar
chill_2.12/0.9.5//chill_2.12-0.9.5.jar
commons-beanutils/1.9.4//commons-beanutils-1.9.4.jar
commons-cli/1.2//commons-cli-1.2.jar
commons-codec/1.15//commons-codec-1.15.jar
commons-collections/3.2.2//commons-collections-3.2.2.jar
commons-compiler/3.0.16//commons-compiler-3.0.16.jar
commons-compress/1.20//commons-compress-1.20.jar
commons-configuration2/2.1.1//commons-configuration2-2.1.1.jar
commons-crypto/1.1.0//commons-crypto-1.1.0.jar
commons-daemon/1.0.13//commons-daemon-1.0.13.jar
commons-dbcp/1.4//commons-dbcp-1.4.jar
commons-httpclient/3.1//commons-httpclient-3.1.jar
commons-io/2.5//commons-io-2.5.jar
Expand All @@ -56,30 +51,13 @@ datanucleus-api-jdo/4.2.4//datanucleus-api-jdo-4.2.4.jar
datanucleus-core/4.1.17//datanucleus-core-4.1.17.jar
datanucleus-rdbms/4.1.19//datanucleus-rdbms-4.1.19.jar
derby/10.14.2.0//derby-10.14.2.0.jar
dnsjava/2.1.7//dnsjava-2.1.7.jar
dropwizard-metrics-hadoop-metrics2-reporter/0.1.2//dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar
ehcache/3.3.1//ehcache-3.3.1.jar
flatbuffers-java/1.9.0//flatbuffers-java-1.9.0.jar
generex/1.0.2//generex-1.0.2.jar
geronimo-jcache_1.0_spec/1.0-alpha-1//geronimo-jcache_1.0_spec-1.0-alpha-1.jar
gson/2.2.4//gson-2.2.4.jar
guava/14.0.1//guava-14.0.1.jar
guice-servlet/4.0//guice-servlet-4.0.jar
guice/4.0//guice-4.0.jar
hadoop-annotations/3.2.0//hadoop-annotations-3.2.0.jar
hadoop-auth/3.2.0//hadoop-auth-3.2.0.jar
hadoop-client/3.2.0//hadoop-client-3.2.0.jar
hadoop-common/3.2.0//hadoop-common-3.2.0.jar
hadoop-hdfs-client/3.2.0//hadoop-hdfs-client-3.2.0.jar
hadoop-mapreduce-client-common/3.2.0//hadoop-mapreduce-client-common-3.2.0.jar
hadoop-mapreduce-client-core/3.2.0//hadoop-mapreduce-client-core-3.2.0.jar
hadoop-mapreduce-client-jobclient/3.2.0//hadoop-mapreduce-client-jobclient-3.2.0.jar
hadoop-yarn-api/3.2.0//hadoop-yarn-api-3.2.0.jar
hadoop-yarn-client/3.2.0//hadoop-yarn-client-3.2.0.jar
hadoop-yarn-common/3.2.0//hadoop-yarn-common-3.2.0.jar
hadoop-yarn-registry/3.2.0//hadoop-yarn-registry-3.2.0.jar
hadoop-yarn-server-common/3.2.0//hadoop-yarn-server-common-3.2.0.jar
hadoop-yarn-server-web-proxy/3.2.0//hadoop-yarn-server-web-proxy-3.2.0.jar
hadoop-client-api/3.2.2//hadoop-client-api-3.2.2.jar
hadoop-client-runtime/3.2.2//hadoop-client-runtime-3.2.2.jar
hive-beeline/2.3.7//hive-beeline-2.3.7.jar
hive-cli/2.3.7//hive-cli-2.3.7.jar
hive-common/2.3.7//hive-common-2.3.7.jar
Expand Down Expand Up @@ -109,8 +87,6 @@ jackson-core/2.11.4//jackson-core-2.11.4.jar
jackson-databind/2.11.4//jackson-databind-2.11.4.jar
jackson-dataformat-yaml/2.11.4//jackson-dataformat-yaml-2.11.4.jar
jackson-datatype-jsr310/2.11.2//jackson-datatype-jsr310-2.11.2.jar
jackson-jaxrs-base/2.9.5//jackson-jaxrs-base-2.9.5.jar
jackson-jaxrs-json-provider/2.9.5//jackson-jaxrs-json-provider-2.9.5.jar
jackson-mapper-asl/1.9.13//jackson-mapper-asl-1.9.13.jar
jackson-module-jaxb-annotations/2.11.4//jackson-module-jaxb-annotations-2.11.4.jar
jackson-module-paranamer/2.11.4//jackson-module-paranamer-2.11.4.jar
Expand All @@ -124,13 +100,10 @@ jakarta.ws.rs-api/2.1.6//jakarta.ws.rs-api-2.1.6.jar
jakarta.xml.bind-api/2.3.2//jakarta.xml.bind-api-2.3.2.jar
janino/3.0.16//janino-3.0.16.jar
javassist/3.25.0-GA//javassist-3.25.0-GA.jar
javax.inject/1//javax.inject-1.jar
javax.jdo/3.2.0-m3//javax.jdo-3.2.0-m3.jar
javax.servlet-api/3.1.0//javax.servlet-api-3.1.0.jar
javolution/5.5.1//javolution-5.5.1.jar
jaxb-api/2.2.11//jaxb-api-2.2.11.jar
jaxb-runtime/2.3.2//jaxb-runtime-2.3.2.jar
jcip-annotations/1.0-1//jcip-annotations-1.0-1.jar
jcl-over-slf4j/1.7.30//jcl-over-slf4j-1.7.30.jar
jdo-api/3.0.1//jdo-api-3.0.1.jar
jersey-client/2.30//jersey-client-2.30.jar
Expand All @@ -144,30 +117,14 @@ jline/2.14.6//jline-2.14.6.jar
joda-time/2.10.5//joda-time-2.10.5.jar
jodd-core/3.5.2//jodd-core-3.5.2.jar
jpam/1.1//jpam-1.1.jar
json-smart/2.3//json-smart-2.3.jar
json/1.8//json-1.8.jar
json4s-ast_2.12/3.7.0-M5//json4s-ast_2.12-3.7.0-M5.jar
json4s-core_2.12/3.7.0-M5//json4s-core_2.12-3.7.0-M5.jar
json4s-jackson_2.12/3.7.0-M5//json4s-jackson_2.12-3.7.0-M5.jar
json4s-scalap_2.12/3.7.0-M5//json4s-scalap_2.12-3.7.0-M5.jar
jsp-api/2.1//jsp-api-2.1.jar
jsr305/3.0.0//jsr305-3.0.0.jar
jta/1.1//jta-1.1.jar
jul-to-slf4j/1.7.30//jul-to-slf4j-1.7.30.jar
kerb-admin/1.0.1//kerb-admin-1.0.1.jar
kerb-client/1.0.1//kerb-client-1.0.1.jar
kerb-common/1.0.1//kerb-common-1.0.1.jar
kerb-core/1.0.1//kerb-core-1.0.1.jar
kerb-crypto/1.0.1//kerb-crypto-1.0.1.jar
kerb-identity/1.0.1//kerb-identity-1.0.1.jar
kerb-server/1.0.1//kerb-server-1.0.1.jar
kerb-simplekdc/1.0.1//kerb-simplekdc-1.0.1.jar
kerb-util/1.0.1//kerb-util-1.0.1.jar
kerby-asn1/1.0.1//kerby-asn1-1.0.1.jar
kerby-config/1.0.1//kerby-config-1.0.1.jar
kerby-pkix/1.0.1//kerby-pkix-1.0.1.jar
kerby-util/1.0.1//kerby-util-1.0.1.jar
kerby-xdr/1.0.1//kerby-xdr-1.0.1.jar
kryo-shaded/4.0.2//kryo-shaded-4.0.2.jar
kubernetes-client/4.12.0//kubernetes-client-4.12.0.jar
kubernetes-model-admissionregistration/4.12.0//kubernetes-model-admissionregistration-4.12.0.jar
Expand Down Expand Up @@ -205,9 +162,7 @@ metrics-json/4.1.1//metrics-json-4.1.1.jar
metrics-jvm/4.1.1//metrics-jvm-4.1.1.jar
minlog/1.3.0//minlog-1.3.0.jar
netty-all/4.1.51.Final//netty-all-4.1.51.Final.jar
nimbus-jose-jwt/4.41.1//nimbus-jose-jwt-4.41.1.jar
objenesis/2.6//objenesis-2.6.jar
okhttp/2.7.5//okhttp-2.7.5.jar
okhttp/3.12.12//okhttp-3.12.12.jar
okio/1.14.0//okio-1.14.0.jar
opencsv/2.3//opencsv-2.3.jar
Expand All @@ -226,7 +181,6 @@ parquet-jackson/1.10.1//parquet-jackson-1.10.1.jar
protobuf-java/2.5.0//protobuf-java-2.5.0.jar
py4j/0.10.9.1//py4j-0.10.9.1.jar
pyrolite/4.30//pyrolite-4.30.jar
re2j/1.1//re2j-1.1.jar
scala-collection-compat_2.12/2.1.1//scala-collection-compat_2.12-2.1.1.jar
scala-compiler/2.12.10//scala-compiler-2.12.10.jar
scala-library/2.12.10//scala-library-2.12.10.jar
Expand All @@ -244,15 +198,12 @@ spire-platform_2.12/0.17.0-M1//spire-platform_2.12-0.17.0-M1.jar
spire-util_2.12/0.17.0-M1//spire-util_2.12-0.17.0-M1.jar
spire_2.12/0.17.0-M1//spire_2.12-0.17.0-M1.jar
stax-api/1.0.1//stax-api-1.0.1.jar
stax2-api/3.1.4//stax2-api-3.1.4.jar
stream/2.9.6//stream-2.9.6.jar
super-csv/2.2.0//super-csv-2.2.0.jar
threeten-extra/1.5.0//threeten-extra-1.5.0.jar
token-provider/1.0.1//token-provider-1.0.1.jar
transaction-api/1.1//transaction-api-1.1.jar
univocity-parsers/2.9.0//univocity-parsers-2.9.0.jar
velocity/1.5//velocity-1.5.jar
woodstox-core/5.0.3//woodstox-core-5.0.3.jar
xbean-asm7-shaded/4.15//xbean-asm7-shaded-4.15.jar
xz/1.5//xz-1.5.jar
zjsonpatch/0.3.0//zjsonpatch-0.3.0.jar
Expand Down
8 changes: 7 additions & 1 deletion external/kafka-0-10-assembly/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -71,9 +71,15 @@
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<artifactId>${hadoop-client-api.artifact}</artifactId>
<version>${hadoop.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>${hadoop-client-runtime.artifact}</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro-mapred</artifactId>
Expand Down
4 changes: 4 additions & 0 deletions external/kafka-0-10-sql/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,10 @@
<artifactId>kafka-clients</artifactId>
<version>${kafka.version}</version>
</dependency>
<dependency>
<groupId>com.google.code.findbugs</groupId>
<artifactId>jsr305</artifactId>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-pool2</artifactId>
Expand Down
5 changes: 5 additions & 0 deletions external/kafka-0-10-token-provider/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,11 @@
<artifactId>mockito-core</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>${hadoop-client-runtime.artifact}</artifactId>
<scope>${hadoop.deps.scope}</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-tags_${scala.binary.version}</artifactId>
Expand Down
8 changes: 7 additions & 1 deletion external/kinesis-asl-assembly/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -91,9 +91,15 @@
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<artifactId>${hadoop-client-api.artifact}</artifactId>
<version>${hadoop.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>${hadoop-client-runtime.artifact}</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro-ipc</artifactId>
Expand Down
7 changes: 6 additions & 1 deletion hadoop-cloud/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -58,10 +58,15 @@
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<artifactId>${hadoop-client-api.artifact}</artifactId>
<version>${hadoop.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>${hadoop-client-runtime.artifact}</artifactId>
<version>${hadoop.version}</version>
</dependency>
<!--
the AWS module pulls in jackson; its transitive dependencies can create
intra-jackson-module version problems.
Expand Down
9 changes: 8 additions & 1 deletion launcher/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,14 @@
<!-- Not needed by the test code, but referenced by SparkSubmit which is used by the tests. -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<artifactId>${hadoop-client-api.artifact}</artifactId>
<version>${hadoop.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>${hadoop-client-runtime.artifact}</artifactId>
<version>${hadoop.version}</version>
<scope>test</scope>
</dependency>
</dependencies>
Expand Down
Loading

0 comments on commit b6f46ca

Please sign in to comment.