Skip to content

Commit

Permalink
[GLUTEN-4260][VL] Fix velox build summary by covering new build optio…
Browse files Browse the repository at this point in the history
…ns (#4277)
  • Loading branch information
PHILO-HE authored Jan 4, 2024
1 parent d448e59 commit 0282319
Show file tree
Hide file tree
Showing 6 changed files with 97 additions and 73 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ If you want to use Gluten **ClickHouse** backend, see [Build with ClickHouse Bac

### 3.2.3 Build options

See [Gluten Usage](./docs/get-started/GlutenUsage.md).
See [Gluten build guide](./docs/get-started/build-guide.md).

# 4 Contribution

Expand Down
4 changes: 2 additions & 2 deletions dev/builddeps-veloxbe.sh
Original file line number Diff line number Diff line change
Expand Up @@ -141,12 +141,12 @@ function concat_velox_param {

# check velox branch
if [[ -n $VELOX_BRANCH ]]; then
VELOX_PARAMETER+="--velox_branch=$VELOX_BRANCH "
VELOX_PARAMETER+="--velox_branch=$VELOX_BRANCH "
fi

# check velox home
if [[ -n $VELOX_HOME ]]; then
VELOX_PARAMETER+="--velox_home=$VELOX_HOME "
VELOX_PARAMETER+="--velox_home=$VELOX_HOME "
fi
}

Expand Down
61 changes: 0 additions & 61 deletions docs/get-started/GlutenUsage.md

This file was deleted.

6 changes: 3 additions & 3 deletions docs/get-started/Velox.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,8 @@ git clone https://github.com/oap-project/gluten.git

# Build Gluten with Velox Backend

It's recommended to use buildbundle-veloxbe.sh and build gluten in one script.
[Gluten Usage](./GlutenUsage.md) listed the parameters and their default value of build command for your reference.
It's recommended to use buildbundle-veloxbe.sh to build gluten in one script.
[Gluten build guide](./build-guide.md) listed the parameters and their default value of build command for your reference.

**For x86_64 build**

Expand Down Expand Up @@ -758,7 +758,7 @@ cat tpch_parquet.scala | spark-shell --name tpch_powertest_velox \
--conf spark.driver.maxResultSize=32g
```

Refer to [Gluten parameters ](../Configuration.md) for more details of each parameter used by Gluten.
Refer to [Gluten configuration](../Configuration.md) for more details.

## Result
*wholestagetransformer* indicates that the offload works.
Expand Down
76 changes: 76 additions & 0 deletions docs/get-started/build-guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
---
layout: page
title: Build Parameters for Velox Backend
nav_order: 4
parent: Getting-Started
---
## Build Parameters
### Native build parameters for buildbundle-veloxbe.sh or builddeps-veloxbe.sh
Please set them via `--`, e.g. `--build_type=Release`.

| Parameters | Description | Default |
|------------------------|----------------------------------------------------------------------------|---------|
| build_type | Build type for Velox & gluten cpp, CMAKE_BUILD_TYPE. | Release |
| build_tests | Build gluten cpp tests. | OFF |
| build_examples | Build udf example. | OFF |
| build_benchmarks | Build gluten cpp benchmarks. | OFF |
| build_jemalloc | Build with jemalloc. | ON |
| build_protobuf | Build protobuf lib. | ON |
| enable_qat | Enable QAT for shuffle data de/compression. | OFF |
| enable_iaa | Enable IAA for shuffle data de/compression. | OFF |
| enable_hbm | Enable HBM allocator. | OFF |
| enable_s3 | Build with S3 support. | OFF |
| enable_gcs | Build with GCs support. | OFF |
| enable_hdfs | Build with HDFS support. | OFF |
| enable_abfs | Build with ABFS support. | OFF |
| enable_ep_cache | Enable caching for external project build (Velox). | OFF |
| enable_vcpkg | Enable vcpkg for static build. | OFF |
| run_setup_script | Run setup script to install Velox dependencies. | ON |
| velox_repo | Specify your own Velox repo to build. | "" |
| velox_branch | Specify your own Velox branch to build. | "" |
| velox_home | Specify your own Velox source path to build. | "" |
| build_velox_tests | Build Velox tests. | OFF |
| build_velox_benchmarks | Build Velox benchmarks (velox_tests and connectors will be disabled if ON) | OFF |
| compile_arrow_java | Compile arrow java for gluten build to use to fix invalid pointer issues. | OFF |

### Velox build parameters for build_velox.sh
Please set them via `--`, e.g., `--velox_home=/YOUR/PATH`.

| Parameters | Description | Default |
|--------------------|-------------------------------------------------------------------------|------------------------------------------|
| velox_home | Specify Velox source path to build. | GLUTEN_SRC/ep/build-velox/build/velox_ep |
| build_type | Velox build type, i.e., CMAKE_BUILD_TYPE. | Release |
| enable_s3 | Build Velox with S3 support. | OFF |
| enable_gcs | Build Velox with GCS support. | OFF |
| enable_hdfs | Build Velox with HDFS support. | OFF |
| enable_abfs | Build Velox with ABFS support. | OFF |
| run_setup_script | Run setup script to install Velox dependencies before build. | ON |
| enable_ep_cache | Enable and reuse cache of Velox build. | OFF |
| build_test_utils | Build Velox with cmake arg -DVELOX_BUILD_TEST_UTILS=ON if ON. | OFF |
| build_tests | Build Velox test. | OFF |
| build_benchmarks | Build Velox benchmarks. | OFF |
| compile_arrow_java | Build arrow java for gluten build to use to fix invalid pointer issues. | OFF |

### Maven build parameters
The below parameters can be set via `-P` for mvn.

| Parameters | Description | Default state |
|---------------------|------------------------------------------------------------------------------|---------------|
| backends-velox | Build Gluten Velox backend. | disabled |
| backends-clickhouse | Build Gluten ClickHouse backend. | disabled |
| rss | Build Gluten with Remote Shuffle Service, only applicable for Velox backend. | disabled |
| delta | Build Gluten with Delta Lake support. | disabled |
| iceberg | Build Gluten with Iceberg support. | disabled |
| spark-3.2 | Build Gluten for Spark 3.2. | enabled |
| spark-3.3 | Build Gluten for Spark 3.3. | disabled |
| spark-3.4 | Build Gluten for Spark 3.4. | disabled |

## Gluten Jar for Deployment
The gluten jar built out is under `GLUTEN_SRC/package/target/`.
It's name pattern is `gluten-<backend_type>-bundle-spark<spark.bundle.version>_<scala.binary.version>-<os.detected.release>_<os.detected.release.version>-<project.version>.jar`.

| Spark Version | spark.bundle.version | scala.binary.version |
|---------------|----------------------|----------------------|
| 3.2.2 | 3.2 | 2.12 |
| 3.3.1 | 3.3 | 2.12 |
| 3.4.2 | 3.4 | 2.12 |
21 changes: 15 additions & 6 deletions ep/build-velox/src/build_velox.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,23 +15,27 @@
# limitations under the License.

set -exu
#Set on run gluten on S3
# New build option may need to be included in get_build_summary to ensure EP build cache workable.
# Enable S3 connector.
ENABLE_S3=OFF
#Set on run gluten on GCS
# Enable GCS connector.
ENABLE_GCS=OFF
#Set on run gluten on HDFS
# Enable HDFS connector.
ENABLE_HDFS=OFF
#Set on run gluten on ABFS
# Enable ABFS connector.
ENABLE_ABFS=OFF
BUILD_TYPE=release
VELOX_HOME=""
ENABLE_EP_CACHE=OFF
# May be deprecated in Gluten build.
ENABLE_BENCHMARK=OFF
# May be deprecated in Gluten build.
ENABLE_TESTS=OFF
# Set to ON for gluten cpp test build.
BUILD_TEST_UTILS=OFF
RUN_SETUP_SCRIPT=ON
OTHER_ARGUMENTS=""
COMPILE_ARROW_JAVA=OFF
OTHER_ARGUMENTS=""

OS=`uname -s`
ARCH=`uname -m`
Expand Down Expand Up @@ -171,7 +175,12 @@ function compile {
function get_build_summary {
COMMIT_HASH=$1
# Ideally all script arguments should be put into build summary.
echo "ENABLE_S3=$ENABLE_S3,ENABLE_GCS=$ENABLE_GCS,ENABLE_HDFS=$ENABLE_HDFS,BUILD_TYPE=$BUILD_TYPE,VELOX_HOME=$VELOX_HOME,ENABLE_EP_CACHE=$ENABLE_EP_CACHE,ENABLE_BENCHMARK=$ENABLE_BENCHMARK,ENABLE_TESTS=$ENABLE_TESTS,RUN_SETUP_SCRIPT=$RUN_SETUP_SCRIPT,OTHER_ARGUMENTS=$OTHER_ARGUMENTS,COMMIT_HASH=$COMMIT_HASH"
# ENABLE_EP_CACHE is excluded. Thus, in current build with ENABLE_EP_CACHE=ON, we can use EP cache
# from last build with ENABLE_EP_CACHE=OFF,
echo "ENABLE_S3=$ENABLE_S3,ENABLE_GCS=$ENABLE_GCS,ENABLE_HDFS=$ENABLE_HDFS,ENABLE_ABFS=$ENABLE_ABFS,\
BUILD_TYPE=$BUILD_TYPE,VELOX_HOME=$VELOX_HOME,ENABLE_BENCHMARK=$ENABLE_BENCHMARK,\
ENABLE_TESTS=$ENABLE_TESTS,BUILD_TEST_UTILS=$BUILD_TEST_UTILS,RUN_SETUP_SCRIPT=$RUN_SETUP_SCRIPT,\
COMPILE_ARROW_JAVA=$COMPILE_ARROW_JAVA,OTHER_ARGUMENTS=$OTHER_ARGUMENTS,COMMIT_HASH=$COMMIT_HASH"
}

function check_commit {
Expand Down

0 comments on commit 0282319

Please sign in to comment.