Skip to content

Commit

Permalink
Merge pull request #258 from apecloud/main
Browse files Browse the repository at this point in the history
merge from cubetran-core
  • Loading branch information
qianyiwen2019 authored Nov 7, 2024
2 parents 1b350de + ee78c69 commit 2844aa1
Show file tree
Hide file tree
Showing 274 changed files with 8,596 additions and 2,358 deletions.
2 changes: 1 addition & 1 deletion .cargo/config.toml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ linker = "aarch64-linux-musl-gcc"
# linker = "rust-lld"
linker = "x86_64-linux-gnu-gcc"
# fix link error for https://github.com/briansmith/ring, refer to: https://github.com/briansmith/ring/issues/1605#issuecomment-1533940670
rustflags = ["-C", "link-args=-fstack-protector-all -lssp"]
# rustflags = ["-C", "link-args=-fstack-protector-all -lssp"]
# rustflags = ["-C", "target-feature=+crt-static"]

[target.aarch64-unknown-linux-gnu]
Expand Down
55 changes: 55 additions & 0 deletions .github/workflows/build_push_images.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
name: Build and push images

on:
workflow_dispatch:
inputs:
branch:
required: true
type: string
default: main
docker_repository:
required: true
type: string
default: apecloud-registry.cn-zhangjiakou.cr.aliyuncs.com
image_tag:
required: true
type: string
default: apecloud/ape-dts:latest

jobs:
build:
name: Cross build
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
ref: ${{ inputs.branch }}

- name: Set up rust
uses: actions-rs/toolchain@v1
with:
toolchain: stable
override: true

- name: Install cross
run: cargo install cross

- name: Build for x86_64
run: cross build --target x86_64-unknown-linux-gnu --release

- name: Build for aarch64
run: cross build --target aarch64-unknown-linux-gnu --release

- name: Rename artifact
run: |
cp target/x86_64-unknown-linux-gnu/release/dt-main amd64-unknown-linux-gnu-dt-main
cp target/aarch64-unknown-linux-gnu/release/dt-main arm64-unknown-linux-gnu-dt-main
- name: Set up docker buildx
uses: docker/setup-buildx-action@v2

- name: Build and push docker image
run: |
docker login --username ${{ secrets.DOCKER_USER }} --password ${{ secrets.DOCKER_PASSWORD }} ${{ inputs.docker_repository }}
docker buildx build -f Dockerfile.github.workflow --platform linux/arm64,linux/amd64 --tag ${{ inputs.docker_repository }}/${{ inputs.image_tag }} --push .
64 changes: 64 additions & 0 deletions .github/workflows/build_upload_s3.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
name: Build and upload to S3

on:
workflow_dispatch:
inputs:
branch:
required: true
type: string
default: main
aws_endpoint:
required: true
type: string
default: https://s3.ap-northeast-1.amazonaws.com
aws_bucket:
required: true
type: string
default: foxlake
aws_destination_dir:
required: true
type: string
default: ape-dts/latest

jobs:
build:
name: Cross build
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
ref: ${{ inputs.branch }}

- name: Set up rust
uses: actions-rs/toolchain@v1
with:
toolchain: stable
override: true

- name: Install cross
run: cargo install cross

- name: Build for x86_64
run: cross build --target x86_64-unknown-linux-gnu --release

- name: Build for aarch64
run: cross build --target aarch64-unknown-linux-gnu --release

- name: Copy artifacts
run: |
mkdir -p bin/arm64-unknown-linux-gnu bin/amd64-unknown-linux-gnu
cp target/x86_64-unknown-linux-gnu/release/dt-main bin/amd64-unknown-linux-gnu/ape-dts
cp target/aarch64-unknown-linux-gnu/release/dt-main bin/arm64-unknown-linux-gnu/ape-dts
cp log4rs.yaml bin/amd64-unknown-linux-gnu/
cp log4rs.yaml bin/arm64-unknown-linux-gnu/
- name: Upload to s3
uses: shallwefootball/s3-upload-action@master
with:
aws_key_id: ${{ secrets.AWS_KEY_ID }}
aws_secret_access_key: ${{ secrets.AWS_SECRET_ACCESS_KEY}}
aws_bucket: ${{ inputs.aws_bucket }}
endpoint: ${{ inputs.aws_endpoint }}
source_dir: bin
destination_dir: ${{ inputs.aws_destination_dir }}
10 changes: 6 additions & 4 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
name: CI # Continuous Integration

on:
push:
branches:
- main
pull_request:
workflow_call:
inputs:
branch:
required: false
type: string
default: main

env:
GITHUB_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
Expand Down
8 changes: 6 additions & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ assigning_clones = "allow"
[workspace.dependencies]
sqlx = {git = "https://github.com/qianyiwen2019/sqlx", features = ["runtime-async-std-rustls", "mysql", "postgres", "bigdecimal", "ipnetwork", "mac_address", "bit-vec", "time", "chrono", "json", "uuid"]}
futures = "0.3.25"
mysql-binlog-connector-rust = { git = "https://github.com/apecloud/mysql-binlog-connector-rust" }
mysql-binlog-connector-rust = "0.2.1"
serde = {version = "1", features = ["derive"]}
serde_json = "1.0.87"
chrono = "0.4.38"
Expand Down Expand Up @@ -52,7 +52,7 @@ uuid = { version = "1.3.1", features = ["v4"] }
nom = "7.1.3"
mongodb = { version = "2.5.0" }
dotenv = "0.15.0"
redis = { version = "0.23.1", features = ["tokio-comp"] }
redis = { version = "0.27.4", features = ["tokio-comp"] }
thiserror = "1.0.44"
async-std = "1.12.0"
apache-avro = { version = "0.16.0", features = ["derive"]}
Expand All @@ -61,3 +61,7 @@ mlua = { version = "0.9.7", features = ["lua54", "vendored"] }
anyhow = {version = "1.0.83", features = ["backtrace"]}
rust_decimal = "1.35.0"
orc-format = {git = "https://github.com/qianyiwen2019/orc-format"}
openssl = { version = "0.10", features = ["vendored"] }
openssl-sys = { version = "0.9", features = ["vendored"]}
actix-web = "4.9.0"
hex = "0.4.3"
18 changes: 18 additions & 0 deletions Cross.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# refer to: https://github.com/cross-rs/cross/blob/main/docs/config_file.md
[build]
default-target = "x86_64-unknown-linux-gnu" # use this target if none is explicitly provided
pre-build = [ # additional commands to run prior to building the package
"dpkg --add-architecture $CROSS_DEB_ARCH",
"apt-get update && apt-get -y install pkg-config libssl-dev:$CROSS_DEB_ARCH cmake libclang-dev",
]

[target.x86_64-unknown-linux-gnu]
linker = "x86_64-linux-gnu-gcc"
# disable target-feature=+crt-static since it causes failures in domain resolution
# https://github.com/rust-lang/rust/issues/100711
# https://stackoverflow.com/questions/73402445/rust-x86-64-unknown-linux-gnu-using-target-feature-crt-static-exits-with-seg
# rustflags = ["-C", "target-feature=+crt-static"]

[target.aarch64-unknown-linux-gnu]
linker = "aarch64-linux-gnu-gcc"
# rustflags = ["-C", "target-feature=+crt-static"]
10 changes: 10 additions & 0 deletions Dockerfile.github.workflow
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
FROM gcr.io/distroless/cc:debug

ARG MODULE_NAME="dt-main"
ARG TARGETOS
ARG TARGETARCH

COPY --chmod=777 ${TARGETARCH}-unknown-${TARGETOS}-gnu-${MODULE_NAME} /ape-dts
COPY log4rs.yaml /log4rs.yaml

ENTRYPOINT [ "/ape-dts" ]
File renamed without changes.
131 changes: 80 additions & 51 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,48 +8,97 @@
- In Rust.


## Tasks supported

| | mysql -> mysql | pg -> pg | mongo -> mongo | redis -> redis |
| :-------- | :-------- | :-------- | :-------- | :-------- |
| Snapshot | ✔ | ✔ | ✔ | ✔ |
| CDC | ✔ | ✔ | ✔ | ✔ |
| Data check/revise/review | ✔ | ✔ | ✔ | |
| Structure migration/check | ✔ | ✔ | | |


# Quick start

## Run demo in docker

Task configurations are in ini format. For more details, refer to [config details](./docs/en/config.md), [snapshot migration](./docs/en/snapshot/migration.md) and [CDC data sync](./docs/en/cdc/migration.md).

```
docker run -it \
--entrypoint sh \
-v [absolute-path]/task_config.ini:/task_config.ini \
apecloud/ape-dts:0.1.13.hotfix4 \
-c "/ape-dts /task_config.ini"
```
## Supported task types
| | mysql -> mysql | pg -> pg | mongo -> mongo | redis -> redis | mysql -> kafka | pg -> kafka| mysql -> starrocks |
| :-------- | :-------- | :-------- | :-------- | :-------- | :-------- | :-------- | :-------- |
| Snapshot | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
| CDC | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
| Data check/revise/review | ✔ | ✔ | ✔ | | | | |
| Structure migration/check | ✔ | ✔ | | | | |


# Quick starts

## Tutorial
- [prerequisites](./docs/en/tutorial/prerequisites.md)
- [mysql -> mysql](./docs/en/tutorial/mysql_to_mysql.md)
- [pg -> pg](./docs/en/tutorial/pg_to_pg.md)
- [mongo -> mongo](./docs/en/tutorial/mongo_to_mongo.md)
- [redis -> redis](./docs/en/tutorial/redis_to_redis.md)
- [mysql -> starrocks](./docs/en/tutorial/mysql_to_starrocks.md)
- [mysql -> kafka -> consumer](./docs/en/tutorial/mysql_to_kafka_consumer.md)
- [pg -> kafka -> consumer](./docs/en/tutorial/pg_to_kafka_consumer.md)
- [mysql -> ape_dts(HTTP server) -> consumer](./docs/en/tutorial/mysql_to_http_server_consumer.md)
- [pg -> ape_dts(HTTP server) -> consumer](./docs/en/tutorial/pg_to_http_server_consumer.md)
- [snapshot + cdc without data loss](./docs/en/tutorial/snapshot_and_cdc_without_data_loss.md)
- [modify data by lua](./docs/en/tutorial/etl_by_lua.md)

## Run tests

Refer to [docs](./dt-tests/README.md) for more details.
Refer to [test docs](./dt-tests/README.md) for details.

# More docs
- Configurations
- [config details](./docs/en/config.md)
- Structure tasks
- [migration](./docs/en/structure/migration.md)
- [check](./docs/en/structure/check.md)
- [check by Liquibase](./docs/en/structure/check_by_liquibase.md)
- Snapshot tasks
- [migration](./docs/en/snapshot/migration.md)
- [check](./docs/en/snapshot/check.md)
- [revise](./docs/en/snapshot/revise.md)
- [review](./docs/en/snapshot/review.md)
- [resume at breakpoint](./docs/en/snapshot/resume.md)
- CDC tasks
- [data sync](./docs/en/cdc/migration.md)
- [heartbeat](./docs/en/cdc/heartbeat.md)
- [two-way data sync](./docs/en/cdc/two_way.md)
- [data sync](./docs/en/cdc/sync.md)
- [heartbeat to source database](./docs/en/cdc/heartbeat.md)
- [two-way data sync](./docs/en/cdc/two_way.md)
- [generate sqls from CDC](./docs/en/cdc/to_sql.md)
- [resume at breakpoint](./docs/en/cdc/resume.md)
- Custom consumers
- [mysql/pg -> kafka -> consumer](./docs/en/consumer/kafka_consumer.md)
- [mysql/pg -> ape_dts(HTTP server) -> consumer](./docs/en/consumer/http_consumer.md)
- Data processing
- [custom lua script](./docs/en/etl/lua.md)
- [modify data by lua](./docs/en/etl/lua.md)
- Monitor
- [monitor info](./docs/en/monitor/monitor.md)
- [position info](./docs/en/monitor/position.md)
- Task templates
- [mysql -> mysql](./docs/templates/mysql_to_mysql.md)
- [pg -> pg](./docs/templates/pg_to_pg.md)
- [mongo -> mongo](./docs/templates/mongo_to_mongo.md)
- [redis -> redis](./docs/templates/redis_to_redis.md)
- [mysql/pg -> kafka](./docs/templates/rdb_to_kafka.md)
- [mysql/pg -> ape_dts(HTTP server)](./docs/templates/rdb_to_http_server.md)
- [mysql -> starrocks](./docs/templates/mysql_to_starrocks.md)

# Benchmark
- MySQL -> MySQL, Snapshot

| Method | Node Specs | RPS(rows per second) | Source MySQL Load (CPU/Memory) | Target MySQL Load (CPU/Memory) |
| :-------- | :-------- | :-------- | :-------- | :-------- |
| ape_dts | 1c2g | 71428 | 8.2% / 5.2% | 211% / 5.1% |
| ape_dts | 2c4g | 99403 | 14.0% / 5.2% | 359% / 5.1% |
| ape_dts | 4c8g | 126582 | 13.8% / 5.2% | 552% / 5.1% |
| debezium | 4c8g | 4051 | 21.5% / 5.2% | 51.2% / 5.1% |

- MySQL -> MySQL, CDC

| Method | Node Specs | RPS(rows per second) | Source MySQL Load (CPU/Memory) | Target MySQL Load (CPU/Memory) |
| :-------- | :-------- | :-------- | :-------- | :-------- |
| ape_dts | 1c2g | 15002 | 18.8% / 5.2% | 467% / 6.5% |
| ape_dts | 2c4g | 24692 | 18.1% / 5.2% | 687% / 6.5% |
| ape_dts | 4c8g | 26287 | 18.2% / 5.2% | 685% / 6.5% |
| debezium | 4c8g | 2951 | 20.4% / 5.2% | 98% / 6.5% |

- Image size

| ape_dts:2.0.2 | debezium/connect:2.7 |
| :-------- | :-------- |
| 86.4 MB | 1.38 GB |

- more benchmark [details](./docs/en/benchmark.md)

# Contributions

Expand All @@ -61,33 +110,13 @@ Refer to [docs](./dt-tests/README.md) for more details.
- dt-main: program entry
- dt-connector: extractors + sinkers for databases
- dt-pipeline: pipeline to connect extractors and sinkers
- dt-parallelizer: various parallel algorithms
- dt-parallelizer: parallel algorithms
- dt-task: create extractors + sinkers + pipelines + parallelizers according to configurations
- dt-common: common utils, basic data structures, metadata management
- dt-tests: integration tests

## Coding

```
cargo build
cargo clippy --workspace
```

## Build docker image

- arm64
```
docker buildx build \
--platform linux/arm64 --tag ape-dts:0.1.0-test-arm64 \
--build-arg MODULE_NAME=dt-main --load .
```

- amd64
```
docker buildx build \
--platform linux/amd64 --tag ape-dts:0.1.0-test-amd64 \
--build-arg MODULE_NAME=dt-main --load .
```
## Build images
[build images](./docs/en/build_images.md)

# Contact us

Expand Down
Loading

0 comments on commit 2844aa1

Please sign in to comment.