Online Document: https://tapdata.github.io/
Tapdata is a live data platform designed to connect data silos and provide fresh data to the downstream operational applications & operational analytics.
- Please make sure you have Docker installed on your machine before you get starated.
- Currently we only tested on linux OS(No specific flavor requirement).
- clone repo:
git clone https://github.com/tapdata/tapdata.git && cd tapdata
release-v2.7
This is the easiest way to experiment Tapdata:
run bash build/quick-use.sh
will pull docker image and start an all-inone container
Alternatively, you may build the project using following command:
- run
bash build/quick-dev.sh
will build a docker image from source and start a all in one container
If you want to build in docker, please install docker and set build/env.sh tapdata_build_env to "docker" (default)
If you want to build in local, please install:
- JDK
- maven set build/env.sh tapdata_build_env to "local"
run bash build/clean.sh
If you want to clean build target
If everything is ok, now you should be in a terminal window, follow next steps, have a try!
# 1. mongodb
source = DataSource("mongodb", "$name").uri("$uri")
# 2. mysql
source = DataSource("mysql", "$name").host("$host").port($port).username("$username").port($port).db("$db")
# 3. pg
source = DataSource("postgres", "$name").host("$host").port($port).username("$username").port($port).db("$db").schema("$schema").logPluginName("wal2json")
# save will check all config, and load schema from source
source.save()
use $name
will switch datasource contextshow tables
will display all tables in current datasourcedesc $table_name
will display table schema
migrate job is real time default
# 1. create a pipeline
p = Pipeline("$name")
# 2. use readFrom and writeTo describe a migrate job
p.readFrom("$source_name.$table").write("$sink_name.$table")
# 3. start job
p.start()
# 4. monitor job
p.monitor()
p.logs()
# 5. stop job
p.stop()
No record schema change support in current version, will support in few days
If you want to change record schema, please use mongodb as sink
# 1. define a python function
def fn(record):
record["x"] = 1
return record
# 2. using processor between source and target
p.readFrom(...).processor(fn).writeTo(...)
migrate job is real time default
# 1. create a pipeline
p = Pipeline("$name")
# 2. use readFrom and writeTo describe a migrate job, multi table relation syntax is a little different
source = Source("$datasource_name", ["table1", "table2"...])
source = Source("$datasource_name", table_re="xxx.*")
# 3. using prefix/suffix add table prefix/suffix
p.readFrom(source).writeTo("$datasource_name", prefix="", suffix="")
# 4. start job
p.start()
show datasources
will show all data sources, you can usedelete datasource $name
delete it if now job using itshow jobs
will show all jobs and it's statslogs job $job_name [limit=20] [t=5] [tail=True]
will show job logmonitor job $job_name
will keep show job metricsstatus job $job_name
will show job status(running/stopped...)
Tapdata uses multiple licenses.
The license for a particular work is defined with following prioritized rules:
- License directly present in the file
- LICENSE file in the same directory as the work
- First LICENSE found when exploring parent directories up to the project top level directory
Defaults to Server Side Public License. For PDK Connectors, the license is Apache V2.