Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Add a way to create static build with docker container and gluten-te #6457

Merged
merged 19 commits into from
Jul 16, 2024
1 change: 0 additions & 1 deletion .github/workflows/clickhouse_be_trigger.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@ on:
- 'gluten-ut/**'
- 'shims/**'
- 'tools/gluten-it/**'
- 'tools/gluten-te/**'
- 'cpp-ch/**'

jobs:
Expand Down
1 change: 0 additions & 1 deletion .github/workflows/velox_docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,6 @@ on:
- 'gluten-ut/**'
- 'shims/**'
- 'tools/gluten-it/**'
- 'tools/gluten-te/**'
- 'ep/build-velox/**'
- 'cpp/*'
- 'cpp/CMake/**'
Expand Down
20 changes: 14 additions & 6 deletions tools/gluten-te/centos/dockerfile-buildenv
Original file line number Diff line number Diff line change
Expand Up @@ -58,13 +58,21 @@ RUN wget https://downloads.apache.org/maven/maven-3/3.8.8/binaries/apache-maven-
rm apache-maven-3.8.8-bin.tar.gz && \
mv apache-maven-3.8.8 /usr/lib/maven

# # Build & install Spark 3.2.2
# RUN cd /opt && git clone --depth 1 --branch v3.2.2 https://github.com/apache/spark.git spark322
# RUN cd /opt/spark322 && ./build/mvn -Pyarn -DskipTests clean install
# Build & install Spark 3.2.2
RUN cd /opt && wget https://archive.apache.org/dist/spark/spark-3.2.2/spark-3.2.2-bin-hadoop3.2.tgz
RUN cd /opt && mkdir spark322 && tar -xvf spark-3.2.2-bin-hadoop3.2.tgz -C spark322 --strip-components=1

# # Build & install Spark 3.3.1
# RUN cd /opt && git clone --depth 1 --branch v3.3.1 https://github.com/apache/spark.git spark331
# RUN cd /opt/spark331 && ./build/mvn -Pyarn -DskipTests clean install
# Build & install Spark 3.3.1
RUN cd /opt && wget https://archive.apache.org/dist/spark/spark-3.3.1/spark-3.3.1-bin-hadoop3.tgz
RUN cd /opt && mkdir spark331 && tar -xvf spark-3.3.1-bin-hadoop3.tgz -C spark331 --strip-components=1

# Build & install Spark 3.4.2
RUN cd /opt && wget https://archive.apache.org/dist/spark/spark-3.4.2/spark-3.4.2-bin-hadoop3.tgz
RUN cd /opt && mkdir spark342 && tar -xvf spark-3.4.2-bin-hadoop3.tgz -C spark342 --strip-components=1

# Build & install Spark 3.5.1
RUN cd /opt && wget https://archive.apache.org/dist/spark/spark-3.5.1/spark-3.5.1-bin-hadoop3.tgz
RUN cd /opt && mkdir spark351 && tar -xvf spark-3.5.1-bin-hadoop3.tgz -C spark351 --strip-components=1

ENV PATH="$PATH:/usr/lib/maven/bin"
ENV LD_LIBRARY_PATH=/usr/local/lib64:/usr/local/lib:/usr/lib64:/usr/lib:/lib64:/lib
Expand Down
1 change: 0 additions & 1 deletion tools/gluten-te/ubuntu/defaults.conf
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,6 @@ DEFAULT_OS_IMAGE_NAME=ubuntu
# Version ID of os image
DEFAULT_OS_IMAGE_TAG=20.04


# Set timezone name
DEFAULT_TIMEZONE=Asia/Shanghai

Expand Down
16 changes: 8 additions & 8 deletions tools/gluten-te/ubuntu/dockerfile-buildenv
Original file line number Diff line number Diff line change
Expand Up @@ -104,20 +104,20 @@ RUN set-login-env "LANGUAGE=en_US:en"
RUN set-login-env "LC_ALL=en_US.UTF-8"

# Build & install Spark 3.2.2
RUN cd /opt && git clone --depth 1 --branch v3.2.2 https://github.com/apache/spark.git spark322
RUN cd /opt/spark322 && ./build/mvn -Pyarn -DskipTests clean install
RUN cd /opt && wget https://archive.apache.org/dist/spark/spark-3.2.2/spark-3.2.2-bin-hadoop3.2.tgz
RUN cd /opt && mkdir spark322 && tar -xvf spark-3.2.2-bin-hadoop3.2.tgz -C spark322 --strip-components=1

# Build & install Spark 3.3.1
RUN cd /opt && git clone --depth 1 --branch v3.3.1 https://github.com/apache/spark.git spark331
RUN cd /opt/spark331 && ./build/mvn -Pyarn -DskipTests clean install
RUN cd /opt && wget https://archive.apache.org/dist/spark/spark-3.3.1/spark-3.3.1-bin-hadoop3.tgz
RUN cd /opt && mkdir spark331 && tar -xvf spark-3.3.1-bin-hadoop3.tgz -C spark331 --strip-components=1

# Build & install Spark 3.4.2
RUN cd /opt && git clone --depth 1 --branch v3.4.2 https://github.com/apache/spark.git spark342
RUN cd /opt/spark342 && ./build/mvn -Pyarn -DskipTests clean install
RUN cd /opt && wget https://archive.apache.org/dist/spark/spark-3.4.2/spark-3.4.2-bin-hadoop3.tgz
RUN cd /opt && mkdir spark342 && tar -xvf spark-3.4.2-bin-hadoop3.tgz -C spark342 --strip-components=1

# Build & install Spark 3.5.1
# RUN cd /opt && git clone --depth 1 --branch v3.5.1 https://github.com/apache/spark.git spark351
# RUN cd /opt/spark351 && ./build/mvn -Pyarn -DskipTests clean install
RUN cd /opt && wget https://archive.apache.org/dist/spark/spark-3.5.1/spark-3.5.1-bin-hadoop3.tgz
RUN cd /opt && mkdir spark351 && tar -xvf spark-3.5.1-bin-hadoop3.tgz -C spark351 --strip-components=1

# Prepare entry command
COPY scripts/cmd.sh /root/.cmd.sh
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Utility for building C++ libs in container

The folder contains script code to build `libvelox.so` and `libgluten.so` in docker container and for host use.

## Prerequisites

1. X86 CPU architecture
2. Host machine with Linux operating system
3. Docker

## Usage

```sh
# 1. Set the following envs in case you are behind http proxy.
export HTTP_PROXY_HOST=myproxy.example.com
export HTTP_PROXY_PORT=55555

# 2. Build the C++ libs in a ubuntu 20.04 docker container.
# Note, this command could take much longer time to finish if it's never run before.
# After the first run, the essential build environment will be cached in docker builder.
#
# Additionally, changes to HTTP_PROXY_HOST / HTTP_PROXY_PORT could invalidate the build cache
# either. For more details, please check docker file `dockerfile-buildenv`.
cd gluten/
tools/gluten-te/ubuntu/examples/buildhere-veloxbe-portable-libs/run.sh

# 3. Check the built libs.
ls -l cpp/build/releases/

# 4. If you intend to build Gluten's bundled jar, continue running subsequent Maven commands.
# For example:
mvn clean install -P spark-3.4,backends-velox -DskipTests
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#!/bin/bash
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

set -ex

BASEDIR=$(readlink -f $(dirname $0))

TIMESTAMP=$(date +%s)

export EXTRA_DOCKER_OPTIONS="--name buildhere-veloxbe-portable-libs-$TIMESTAMP -v $BASEDIR/scripts:/opt/scripts"

$BASEDIR/../../cbash-mount.sh '/opt/scripts/all.sh'
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
#!/bin/bash
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

set -ex

export NUM_THREADS=$(nproc)
export CMAKE_BUILD_PARALLEL_LEVEL=$(nproc)

# Retry code that copied from https://unix.stackexchange.com/a/137639.
function fail {
echo $1 >&2
exit 1
}

function retry {
local n=1
local max=5
local delay=15
while true; do
"$@" && break || {
if [[ $n -lt $max ]]; then
((n++))
echo "Command failed. Attempt $n/$max:"
sleep $delay;
else
fail "The command has failed after $n attempts."
fi
}
done
}

cd /opt/gluten
retry apt-get install curl zip unzip tar pkg-config autoconf-archive bison flex
retry source ./dev/vcpkg/env.sh
retry dev/builddeps-veloxbe.sh --build_tests=OFF --build_benchmarks=OFF --enable_s3=ON --enable_gcs=ON --enable_hdfs=ON --enable_abfs=ON
Loading