Skip to content

Commit

Permalink
Merge branch 'main' into GLUTEN-3929
Browse files Browse the repository at this point in the history
  • Loading branch information
fyp711 authored Dec 5, 2023
2 parents 8e0e796 + dfb985c commit d296138
Show file tree
Hide file tree
Showing 336 changed files with 10,534 additions and 5,993 deletions.
9 changes: 7 additions & 2 deletions .github/workflows/stale.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,12 @@ jobs:
steps:
- uses: actions/stale@v8
with:
stale-pr-message: 'This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.'
close-pr-message: 'This PR was closed because it has been stalled for 10 days with no activity.'
days-before-pr-stale: 45
days-before-pr-close: 10
stale-pr-message: 'This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.'
close-pr-message: 'This PR was auto-closed because it has been stalled for 10 days with no activity. Please feel free to reopen if it's still valid. Thanks.'
days-before-issue-stale: -1 # disabled
days-before-issue-close: -1 # disabled
stale-issue-message: 'This issue is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.'
close-issue-message: 'This issue was auto-closed because it has been stalled for 10 days with no activity. Please feel free to reopen if it's still valid. Thanks.'
operations-per-run: 300
53 changes: 25 additions & 28 deletions .github/workflows/velox_be.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ concurrency:
cancel-in-progress: true

jobs:

ubuntu2004-test-spark32:
runs-on: velox-self-hosted
steps:
Expand Down Expand Up @@ -87,8 +88,7 @@ jobs:
if: ${{ always() }}
run: |
docker stop ubuntu2004-test-$GITHUB_RUN_ID || true
ubuntu2004-test-slow-spark32:
ubuntu2004-test-spark32-slow:
runs-on: velox-self-hosted
steps:
- uses: actions/checkout@v4
Expand Down Expand Up @@ -125,7 +125,6 @@ jobs:
if: ${{ always() }}
run: |
docker stop ubuntu2004-test-slow-$GITHUB_RUN_ID || true
ubuntu2004-test-spark33-slow:
runs-on: velox-self-hosted
steps:
Expand Down Expand Up @@ -162,7 +161,6 @@ jobs:
if: ${{ always() }}
run: |
docker stop ubuntu2004-test-spark33-slow-$GITHUB_RUN_ID || true
ubuntu2004-test-spark33:
runs-on: velox-self-hosted
steps:
Expand Down Expand Up @@ -229,7 +227,6 @@ jobs:
if: ${{ always() }}
run: |
docker stop ubuntu2004-test-spark34-slow-$GITHUB_RUN_ID || true
ubuntu2004-test-spark34:
runs-on: velox-self-hosted
steps:
Expand Down Expand Up @@ -259,7 +256,6 @@ jobs:
if: ${{ always() }}
run: |
docker stop ubuntu2004-test-spark34-$GITHUB_RUN_ID || true
ubuntu2204-test:
runs-on: velox-self-hosted
steps:
Expand All @@ -273,13 +269,13 @@ jobs:
run: |
docker exec ubuntu2204-test-$GITHUB_RUN_ID bash -c '
cd /opt/gluten/ep/build-velox/src && \
./get_velox.sh --velox_home=/opt/velox --enable_hdfs=ON --enable_s3=ON && \
./build_velox.sh --velox_home=/opt/velox --enable_ep_cache=ON --enable_hdfs=ON --enable_s3=ON'
./get_velox.sh --velox_home=/opt/velox --enable_hdfs=ON --enable_s3=ON --enable_gcs=ON && \
./build_velox.sh --velox_home=/opt/velox --enable_ep_cache=ON --enable_hdfs=ON --enable_s3=ON --enable_gcs=ON'
- name: Build Gluten CPP library
run: |
docker exec ubuntu2204-test-$GITHUB_RUN_ID bash -c '
cd /opt/gluten/cpp && \
./compile.sh --build_velox_backend=ON --velox_home=/opt/velox --enable_hdfs=ON --enable_s3=ON'
./compile.sh --build_velox_backend=ON --velox_home=/opt/velox --enable_hdfs=ON --enable_s3=ON --enable_gcs=ON'
- name: Build for Spark 3.2.2
run: |
docker exec ubuntu2204-test-$GITHUB_RUN_ID bash -c '
Expand Down Expand Up @@ -337,7 +333,6 @@ jobs:
if: ${{ always() }}
run: |
docker stop ubuntu2204-test-$GITHUB_RUN_ID || true
centos8-test:
runs-on: velox-self-hosted
steps:
Expand All @@ -353,14 +348,14 @@ jobs:
source /env.sh && \
sudo yum -y install patch && \
cd /opt/gluten/ep/build-velox/src && \
./get_velox.sh --velox_home=/opt/velox --enable_hdfs=ON --enable_s3=ON && \
./build_velox.sh --velox_home=/opt/velox --enable_ep_cache=ON --enable_hdfs=ON --enable_s3=ON'
./get_velox.sh --velox_home=/opt/velox --enable_hdfs=ON --enable_s3=ON --enable_gcs=ON && \
./build_velox.sh --velox_home=/opt/velox --enable_ep_cache=ON --enable_hdfs=ON --enable_s3=ON --enable_gcs=ON'
- name: Build Gluten CPP library
run: |
docker exec centos8-test-$GITHUB_RUN_ID bash -c '
source /env.sh && \
cd /opt/gluten/cpp && \
./compile.sh --build_velox_backend=ON --velox_home=/opt/velox --enable_hdfs=ON --enable_s3=ON'
./compile.sh --build_velox_backend=ON --velox_home=/opt/velox --enable_hdfs=ON --enable_s3=ON --enable_gcs=ON'
- name: Build for Spark 3.2.2
run: |
docker exec centos8-test-$GITHUB_RUN_ID bash -c '
Expand All @@ -374,14 +369,14 @@ jobs:
--local --preset=velox --benchmark-type=h --error-on-memleak --disable-aqe --off-heap-size=10g -s=1.0 --threads=16 --iterations=1 \
&& GLUTEN_IT_JVM_ARGS=-Xmx50G sbin/gluten-it.sh queries-compare \
--local --preset=velox --benchmark-type=ds --error-on-memleak --off-heap-size=50g -s=30.0 --threads=32 --iterations=1'
- name: (To be fixed) TPC-H SF1.0 && TPC-DS SF30.0 Parquet local spark3.2 random kill tasks
- name: TPC-H SF1.0 && TPC-DS SF30.0 Parquet local spark3.2 random kill tasks
run: |
docker exec centos8-test-$GITHUB_RUN_ID bash -c 'cd /opt/gluten/tools/gluten-it && \
mvn clean install -Pspark-3.2 \
&& GLUTEN_IT_JVM_ARGS=-Xmx5G sbin/gluten-it.sh queries \
--local --preset=velox --benchmark-type=h --error-on-memleak --disable-aqe --off-heap-size=10g -s=1.0 --threads=16 --iterations=1 --skip-data-gen --random-kill-tasks \
&& GLUTEN_IT_JVM_ARGS=-Xmx50G sbin/gluten-it.sh queries \
--local --preset=velox --benchmark-type=ds --error-on-memleak --off-heap-size=50g -s=30.0 --threads=32 --iterations=1 --skip-data-gen --random-kill-tasks' || true
--local --preset=velox --benchmark-type=ds --error-on-memleak --off-heap-size=50g -s=30.0 --threads=32 --iterations=1 --skip-data-gen --random-kill-tasks'
- name: Exit docker container
if: ${{ always() }}
run: |
Expand All @@ -393,23 +388,25 @@ jobs:
- uses: actions/checkout@v4
- name: Setup docker container
run: |
docker run --rm --init --privileged --ulimit nofile=65536:65536 --ulimit core=-1 --security-opt seccomp=unconfined \
-v $PWD:/opt/gluten --name centos7-test-$GITHUB_RUN_ID -e NUM_THREADS=30 -detach 10.0.2.4:5000/gluten-dev/centos:7 \
bash -c 'cd /opt/gluten && sleep 14400'
EXTRA_DOCKER_OPTIONS="--name centos7-test-$GITHUB_RUN_ID -e NUM_THREADS=30 --detach" \
NON_INTERACTIVE=ON \
MOUNT_MAVEN_CACHE=OFF \
OS_IMAGE=centos:7 \
OS_VERSION=7 \
tools/gluten-te/centos/cbash.sh sleep 14400
- name: Build Gluten velox third party
run: |
docker exec centos7-test-$GITHUB_RUN_ID bash -c '
source /env.sh && \
sudo yum -y install patch && \
yum -y install epel-release centos-release-scl patch sudo && \
cd /opt/gluten/ep/build-velox/src && \
./get_velox.sh --velox_home=/opt/velox && \
./build_velox.sh --run_setup_script=OFF --velox_home=/opt/velox --enable_ep_cache=ON'
./get_velox.sh --velox_home=/opt/velox --enable_hdfs=ON --enable_s3=ON --enable_gcs=ON && \
./build_velox.sh --velox_home=/opt/velox --enable_ep_cache=ON --enable_s3=ON --enable_gcs=ON --enable_hdfs=ON'
- name: Build Gluten CPP library
run: |
docker exec centos7-test-$GITHUB_RUN_ID bash -c '
source /env.sh && \
cd /opt/gluten/cpp && \
./compile.sh --build_velox_backend=ON --velox_home=/opt/velox'
source /opt/rh/devtoolset-9/enable && \
./compile.sh --build_velox_backend=ON --velox_home=/opt/velox --enable_hdfs=ON --enable_s3=ON --enable_gcs=ON'
- name: Build for Spark 3.2.2
run: |
docker exec centos7-test-$GITHUB_RUN_ID bash -c '
Expand Down Expand Up @@ -447,7 +444,7 @@ jobs:
-d=OFFHEAP_SIZE:3g,spark.memory.offHeap.size=3g \
-d=OVER_ACQUIRE:0.3,spark.gluten.memory.overAcquiredMemoryRatio=0.3 \
-d=OVER_ACQUIRE:0.5,spark.gluten.memory.overAcquiredMemoryRatio=0.5' || true
- name: (To be fixed) TPC-DS SF30.0 Parquet local spark3.2 Q23A/Q23B low memory
- name: TPC-DS SF30.0 Parquet local spark3.2 Q23A/Q23B low memory
run: |
docker exec centos7-test-$GITHUB_RUN_ID bash -c 'cd /opt/gluten/tools/gluten-it && \
GLUTEN_IT_JVM_ARGS=-Xmx50G sbin/gluten-it.sh parameterized \
Expand All @@ -458,8 +455,8 @@ jobs:
-d=OFFHEAP_SIZE:2g,spark.memory.offHeap.size=2g \
-d=PARTIAL_MODE:ABANDONED,spark.gluten.sql.columnar.backend.velox.maxPartialAggregationMemoryRatio=1.0,spark.gluten.sql.columnar.backend.velox.maxExtendedPartialAggregationMemoryRatio=1.0,spark.gluten.sql.columnar.backend.velox.abandonPartialAggregationMinPct=0,spark.gluten.sql.columnar.backend.velox.abandonPartialAggregationMinRows=0 \
-d=PARTIAL_MODE:CACHED,spark.gluten.sql.columnar.backend.velox.maxPartialAggregationMemoryRatio=1.0,spark.gluten.sql.columnar.backend.velox.maxExtendedPartialAggregationMemoryRatio=1.0,spark.gluten.sql.columnar.backend.velox.abandonPartialAggregationMinPct=100,spark.gluten.sql.columnar.backend.velox.abandonPartialAggregationMinRows=0 \
-d=PARTIAL_MODE:FLUSHED,spark.gluten.sql.columnar.backend.velox.maxPartialAggregationMemoryRatio=0.05,spark.gluten.sql.columnar.backend.velox.maxExtendedPartialAggregationMemoryRatio=0.1,spark.gluten.sql.columnar.backend.velox.abandonPartialAggregationMinPct=100,spark.gluten.sql.columnar.backend.velox.abandonPartialAggregationMinRows=0' || true
- name: (To be fixed) TPC-DS SF30.0 Parquet local spark3.2 Q97 low memory
-d=PARTIAL_MODE:FLUSHED,spark.gluten.sql.columnar.backend.velox.maxPartialAggregationMemoryRatio=0.05,spark.gluten.sql.columnar.backend.velox.maxExtendedPartialAggregationMemoryRatio=0.1,spark.gluten.sql.columnar.backend.velox.abandonPartialAggregationMinPct=100,spark.gluten.sql.columnar.backend.velox.abandonPartialAggregationMinRows=0'
- name: (To be fixed) TPC-DS SF30.0 Parquet local spark3.2 Q97 low memory # The case currently causes crash with "free: invalid size".
run: |
docker exec centos7-test-$GITHUB_RUN_ID bash -c 'cd /opt/gluten/tools/gluten-it && \
GLUTEN_IT_JVM_ARGS=-Xmx50G sbin/gluten-it.sh parameterized \
Expand Down Expand Up @@ -491,7 +488,7 @@ jobs:
cd /opt/gluten && \
sudo -E ./dev/vcpkg/setup-build-depends.sh && \
source ./dev/vcpkg/env.sh && \
./dev/builddeps-veloxbe.sh --build_tests=ON --build_benchmarks=ON --enable_s3=ON --enable_hdfs=ON'
./dev/builddeps-veloxbe.sh --build_tests=ON --build_benchmarks=ON --enable_s3=ON --enable_gcs=ON --enable_hdfs=ON'
- name: Build for Spark 3.2.2
run: |
docker exec static-build-test-$GITHUB_RUN_ID bash -c '
Expand Down
10 changes: 5 additions & 5 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,11 @@ The contribution is not limited to contributing code. It also includes reviewing

### GitHub Pull Request

If necessary, please file a GitHub issue beforehand and refer to that issue in your PR title.
If necessary, please file a GitHub issue beforehand and reference that issue in your PR title.

By convention, `CH` & `VL` represent ClickHouse & Velox backend respectively. Please add `[CH]` or `[VL]` in your PR title
accordingly if your proposed code change is just for one specific backend. Please do NOT add such tag if common code is changed,
but add `[CORE]` instead.
but add tags like `[CORE]`, `[MINOR]`, `[DOC]` instead.

Examples:
* PR just for CH backend:
Expand Down Expand Up @@ -93,9 +93,9 @@ For critical code change, merging the PR requires at least two committers' appro

**Squash and merge** should be selected to merge PR. If commit message title doesn't align with its PR title, please revise it.

Example for commit message title:
Example for commit message:

`[GLUTEN-<issue ID>][CORE] xxxx (#<PR ID>)`

Please add some brief description, if helpful, in the text box when merging PR. Please do NOT leave any meaningless
historical commit message, such as "Initial commit", "Fix", "Fix code style", etc.
Please add some brief description in extended message section, if helpful, when merging PR. Please do NOT leave any meaningless
historical commit message, such as "Initial commit", "Fix", etc.
Loading

0 comments on commit d296138

Please sign in to comment.