Skip to content

OpenSourced Project to sync Databases & Datasources across multiple Destinations

License

Notifications You must be signed in to change notification settings

nishantnath/pyfission

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pyFission

pyFission syncs tables/schemas across databases as defined in the configs/fission.py file. If src_table and src_schema args are not provided, it will sync all tables and schemas specified in the fission.py config file by summoning multiple bash commands in parallel.

The pyfission module has 2 methods of syncing tables -

  1. full: truncate + full sync of table, as name suggests
  2. incremental: syncs gichanges based on primary key and replication key provided in fission.py config file

pyFission automatically detects and builds the schema definition for tables, including nested and repeated fields for JSON-like schemas e.g:- BigQuery. Also, pyFission patches tables automatically if any new fields are added.

Getting started

Usage

  • Modify configs/custom_config.py and configs/fission.py with DB creds and sync config respectively
  • Add service account private key files/secrets to secrets_storage dir if needed
$ python -m pyfission --src [] --dest [] sync --help
usage: __main__.py pyfission [-h] [--src_table SRC_TABLE]
                          [--src_schema SRC_SCHEMA] [--src_db SRC_DB]
                          [--dest_table DEST_TABLE]
                          [--dest_schema DEST_SCHEMA] [--dest_db DEST_DB]
                          [--method {full,incremental}]
                          [--out_format {csv,json}]

optional arguments:
  -h, --help            show this help message and exit
  --src_table SRC_TABLE
                        Overrides table definition from pyfission configs
  --src_schema SRC_SCHEMA
                        Overrides schema definition from pyfission configs
  --src_db SRC_DB       Overrides database definition from pyfission configs
  --dest_table DEST_TABLE
                        Overrides table definition from pyfission configs
  --dest_schema DEST_SCHEMA
                        Overrides schema definition from pyfission configs
  --dest_db DEST_DB     Overrides database definition from pyfission configs
  --method {full,incremental}
                        Method of Replication
  --out_format {csv,json}
                        Format of output files
  • Recommended out_format:

    • json for BigQuery
    • csv for others
  • To sync a particular table - specify the src_schema and src_table args

    • dest_table will have same name as src_table unless explicitly specified
    • dest_schema will be set from fission.py config file's fission_dest dict
    • src_db and dest_db will derived from src and dest
python -m pyfission --src [] --dest [] sync --src_table [] --src_schema [] --out_format []
  • To sync an entire DB - only provide src and dest args
python -m pyfission --src [] --dest [] sync --out_format []

Installation

Get the latest release on PyPi

pip install pyfission

Contribution/Local installation

  1. Clone the repo
$ git clone https://github.com/nishantnath/pyfission.git
  1. Change the working directory to fission
$ cd pyfission
  1. Install python3 and the requirements if they don't exist
$ pip install -r requirements.txt

About

OpenSourced Project to sync Databases & Datasources across multiple Destinations

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •