Skip to content

Commit

Permalink
[GLUTEN-7004][CORE] Bump Spark version to 3.4.3 (apache#7115)
Browse files Browse the repository at this point in the history
  • Loading branch information
Yohahaha authored and hengzhen.sq committed Sep 11, 2024
1 parent f449b59 commit b3a9512
Show file tree
Hide file tree
Showing 10 changed files with 34 additions and 32 deletions.
10 changes: 5 additions & 5 deletions .github/workflows/util/install_spark_resources.sh
Original file line number Diff line number Diff line change
Expand Up @@ -50,13 +50,13 @@ case "$1" in
3.4)
# Spark-3.4
cd ${INSTALL_DIR} && \
wget -nv https://archive.apache.org/dist/spark/spark-3.4.2/spark-3.4.2-bin-hadoop3.tgz && \
tar --strip-components=1 -xf spark-3.4.2-bin-hadoop3.tgz spark-3.4.2-bin-hadoop3/jars/ && \
rm -rf spark-3.4.2-bin-hadoop3.tgz && \
wget -nv https://archive.apache.org/dist/spark/spark-3.4.3/spark-3.4.3-bin-hadoop3.tgz && \
tar --strip-components=1 -xf spark-3.4.3-bin-hadoop3.tgz spark-3.4.3-bin-hadoop3/jars/ && \
rm -rf spark-3.4.3-bin-hadoop3.tgz && \
mkdir -p ${INSTALL_DIR}/shims/spark34/spark_home/assembly/target/scala-2.12 && \
mv jars ${INSTALL_DIR}/shims/spark34/spark_home/assembly/target/scala-2.12 && \
wget -nv https://github.com/apache/spark/archive/refs/tags/v3.4.2.tar.gz && \
tar --strip-components=1 -xf v3.4.2.tar.gz spark-3.4.2/sql/core/src/test/resources/ && \
wget -nv https://github.com/apache/spark/archive/refs/tags/v3.4.3.tar.gz && \
tar --strip-components=1 -xf v3.4.3.tar.gz spark-3.4.3/sql/core/src/test/resources/ && \
mkdir -p shims/spark34/spark_home/ && \
mv sql shims/spark34/spark_home/
;;
Expand Down
10 changes: 5 additions & 5 deletions .github/workflows/velox_backend.yml
Original file line number Diff line number Diff line change
Expand Up @@ -760,15 +760,15 @@ jobs:
working-directory: ${{ github.workspace }}
run: |
mkdir -p '${{ env.CCACHE_DIR }}'
- name: Prepare spark.test.home for Spark 3.4.2 (other tests)
- name: Prepare spark.test.home for Spark 3.4.3 (other tests)
run: |
bash .github/workflows/util/install_spark_resources.sh 3.4
dnf module -y install python39 && \
alternatives --set python3 /usr/bin/python3.9 && \
pip3 install setuptools && \
pip3 install pyspark==3.4.2 cython && \
pip3 install pyspark==3.4.3 cython && \
pip3 install pandas pyarrow
- name: Build and Run unit test for Spark 3.4.2 (other tests)
- name: Build and Run unit test for Spark 3.4.3 (other tests)
run: |
cd $GITHUB_WORKSPACE/
export SPARK_SCALA_VERSION=2.12
Expand Down Expand Up @@ -818,10 +818,10 @@ jobs:
working-directory: ${{ github.workspace }}
run: |
mkdir -p '${{ env.CCACHE_DIR }}'
- name: Prepare spark.test.home for Spark 3.4.2 (slow tests)
- name: Prepare spark.test.home for Spark 3.4.3 (slow tests)
run: |
bash .github/workflows/util/install_spark_resources.sh 3.4
- name: Build and Run unit test for Spark 3.4.2 (slow tests)
- name: Build and Run unit test for Spark 3.4.3 (slow tests)
run: |
cd $GITHUB_WORKSPACE/
$MVN_CMD clean test -Pspark-3.4 -Pbackends-velox -Pceleborn -Piceberg -Pdelta -Pspark-ut -Phudi \
Expand Down
20 changes: 10 additions & 10 deletions docs/get-started/Velox.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,19 +7,19 @@ parent: Getting-Started

# Supported Version

| Type | Version |
|-------|---------------------------------|
| Spark | 3.2.2, 3.3.1, 3.4.2, 3.5.1 |
| OS | Ubuntu20.04/22.04, Centos7/8 |
| jdk | openjdk8/jdk17 |
| scala | 2.12 |
| Type | Version |
|-------|------------------------------|
| Spark | 3.2.2, 3.3.1, 3.4.3, 3.5.1 |
| OS | Ubuntu20.04/22.04, Centos7/8 |
| jdk | openjdk8/jdk17 |
| scala | 2.12 |

# Prerequisite

Currently, Gluten+Velox backend is only tested on **Ubuntu20.04/Ubuntu22.04/Centos7/Centos8**.
Other kinds of OS support are still in progress. The long term goal is to support several common OS and conda env deployment.

Currently, the officially supported Spark versions are 3.2.2, 3.3.1, 3.4.2 and 3.5.1.
Currently, the officially supported Spark versions are 3.2.2, 3.3.1, 3.4.3 and 3.5.1.

We need to set up the `JAVA_HOME` env. Currently, Gluten supports **java 8** and **java 17**.

Expand Down Expand Up @@ -428,7 +428,7 @@ After the set-up, you can now build Gluten with HBM. Below command is used to en
```bash
cd /path/to/gluten

## The script builds four jars for spark 3.2.2, 3.3.1, 3.4.2 and 3.5.1.
## The script builds four jars for spark 3.2.2, 3.3.1, 3.4.3 and 3.5.1.
./dev/buildbundle-veloxbe.sh --enable_hbm=ON
```

Expand Down Expand Up @@ -514,7 +514,7 @@ exit
```bash
cd /path/to/gluten

## The script builds four jars for spark 3.2.2, 3.3.1, 3.4.2 and 3.5.1.
## The script builds four jars for spark 3.2.2, 3.3.1, 3.4.3 and 3.5.1.
./dev/buildbundle-veloxbe.sh --enable_qat=ON
```

Expand Down Expand Up @@ -610,7 +610,7 @@ After the set-up, you can now build Gluten with QAT. Below command is used to en
```bash
cd /path/to/gluten

## The script builds four jars for spark 3.2.2, 3.3.1, 3.4.2 and 3.5.1.
## The script builds four jars for spark 3.2.2, 3.3.1, 3.4.3 and 3.5.1.
./dev/buildbundle-veloxbe.sh --enable_iaa=ON
```

Expand Down
2 changes: 1 addition & 1 deletion docs/get-started/build-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,5 +76,5 @@ It's name pattern is `gluten-<backend_type>-bundle-spark<spark.bundle.version>_<
|---------------|----------------------|----------------------|
| 3.2.2 | 3.2 | 2.12 |
| 3.3.1 | 3.3 | 2.12 |
| 3.4.2 | 3.4 | 2.12 |
| 3.4.3 | 3.4 | 2.12 |
| 3.5.1 | 3.5 | 2.12 |
Original file line number Diff line number Diff line change
Expand Up @@ -702,6 +702,7 @@ class VeloxTestSettings extends BackendTestSettings {
enableSuite[GlutenParquetV2PartitionDiscoverySuite]
enableSuite[GlutenParquetProtobufCompatibilitySuite]
enableSuite[GlutenParquetV1QuerySuite]
.exclude("row group skipping doesn't overflow when reading into larger type")
// Unsupport spark.sql.files.ignoreCorruptFiles.
.exclude("Enabling/disabling ignoreCorruptFiles")
// decimal failed ut
Expand All @@ -717,6 +718,7 @@ class VeloxTestSettings extends BackendTestSettings {
.exclude(
"SPARK-26677: negated null-safe equality comparison should not filter matched row groups")
enableSuite[GlutenParquetV2QuerySuite]
.exclude("row group skipping doesn't overflow when reading into larger type")
// Unsupport spark.sql.files.ignoreCorruptFiles.
.exclude("Enabling/disabling ignoreCorruptFiles")
// decimal failed ut
Expand Down
4 changes: 2 additions & 2 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@
<scala.version>2.12.15</scala.version>
<spark.major.version>3</spark.major.version>
<sparkbundle.version>3.4</sparkbundle.version>
<spark.version>3.4.2</spark.version>
<spark.version>3.4.3</spark.version>
<sparkshim.module.name>spark34</sparkshim.module.name>
<sparkshim.artifactId>spark-sql-columnar-shims-spark34</sparkshim.artifactId>
<iceberg.version>1.5.0</iceberg.version>
Expand Down Expand Up @@ -306,7 +306,7 @@
<sparkbundle.version>3.4</sparkbundle.version>
<sparkshim.module.name>spark34</sparkshim.module.name>
<sparkshim.artifactId>spark-sql-columnar-shims-spark34</sparkshim.artifactId>
<spark.version>3.4.2</spark.version>
<spark.version>3.4.3</spark.version>
<iceberg.version>1.5.0</iceberg.version>
<delta.package.name>delta-core</delta.package.name>
<delta.version>2.4.0</delta.version>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ import org.apache.gluten.sql.shims.{SparkShimDescriptor, SparkShims}
import org.apache.gluten.sql.shims.spark34.SparkShimProvider.DESCRIPTOR

object SparkShimProvider {
val DESCRIPTOR = SparkShimDescriptor(3, 4, 2)
val DESCRIPTOR = SparkShimDescriptor(3, 4, 3)
}

class SparkShimProvider extends org.apache.gluten.sql.shims.SparkShimProvider {
Expand Down
4 changes: 2 additions & 2 deletions tools/gluten-it/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
<maven.compiler.source>${java.version}</maven.compiler.source>
<maven.compiler.target>${java.version}</maven.compiler.target>
<scala.library.version>2.12.17</scala.library.version>
<spark.version>3.4.2</spark.version>
<spark.version>3.4.3</spark.version>
<scala.binary.version>2.12</scala.binary.version>
<spark.major.version>3</spark.major.version>
<celeborn.version>0.3.2-incubating</celeborn.version>
Expand Down Expand Up @@ -157,7 +157,7 @@
<profile>
<id>spark-3.4</id>
<properties>
<spark.version>3.4.2</spark.version>
<spark.version>3.4.3</spark.version>
<scala.library.version>2.12.17</scala.library.version>
</properties>
</profile>
Expand Down
6 changes: 3 additions & 3 deletions tools/gluten-te/centos/dockerfile-buildenv
Original file line number Diff line number Diff line change
Expand Up @@ -66,9 +66,9 @@ RUN cd /opt && mkdir spark322 && tar -xvf spark-3.2.2-bin-hadoop3.2.tgz -C spark
RUN cd /opt && wget https://archive.apache.org/dist/spark/spark-3.3.1/spark-3.3.1-bin-hadoop3.tgz
RUN cd /opt && mkdir spark331 && tar -xvf spark-3.3.1-bin-hadoop3.tgz -C spark331 --strip-components=1

# Build & install Spark 3.4.2
RUN cd /opt && wget https://archive.apache.org/dist/spark/spark-3.4.2/spark-3.4.2-bin-hadoop3.tgz
RUN cd /opt && mkdir spark342 && tar -xvf spark-3.4.2-bin-hadoop3.tgz -C spark342 --strip-components=1
# Build & install Spark 3.4.3
RUN cd /opt && wget https://archive.apache.org/dist/spark/spark-3.4.3/spark-3.4.3-bin-hadoop3.tgz
RUN cd /opt && mkdir spark342 && tar -xvf spark-3.4.3-bin-hadoop3.tgz -C spark342 --strip-components=1

# Build & install Spark 3.5.1
RUN cd /opt && wget https://archive.apache.org/dist/spark/spark-3.5.1/spark-3.5.1-bin-hadoop3.tgz
Expand Down
6 changes: 3 additions & 3 deletions tools/gluten-te/ubuntu/dockerfile-buildenv
Original file line number Diff line number Diff line change
Expand Up @@ -119,9 +119,9 @@ RUN cd /opt && mkdir spark322 && tar -xvf spark-3.2.2-bin-hadoop3.2.tgz -C spark
RUN cd /opt && wget https://archive.apache.org/dist/spark/spark-3.3.1/spark-3.3.1-bin-hadoop3.tgz
RUN cd /opt && mkdir spark331 && tar -xvf spark-3.3.1-bin-hadoop3.tgz -C spark331 --strip-components=1

# Build & install Spark 3.4.2
RUN cd /opt && wget https://archive.apache.org/dist/spark/spark-3.4.2/spark-3.4.2-bin-hadoop3.tgz
RUN cd /opt && mkdir spark342 && tar -xvf spark-3.4.2-bin-hadoop3.tgz -C spark342 --strip-components=1
# Build & install Spark 3.4.3
RUN cd /opt && wget https://archive.apache.org/dist/spark/spark-3.4.3/spark-3.4.3-bin-hadoop3.tgz
RUN cd /opt && mkdir spark343 && tar -xvf spark-3.4.3-bin-hadoop3.tgz -C spark343 --strip-components=1

# Build & install Spark 3.5.1
RUN cd /opt && wget https://archive.apache.org/dist/spark/spark-3.5.1/spark-3.5.1-bin-hadoop3.tgz
Expand Down

0 comments on commit b3a9512

Please sign in to comment.