This folder contains pipeline templates and samples for StreamSets Data Collector.
The following templates/samples are currently available:
Name | Description |
---|---|
Citi Bike real-time system data (Basic) | Reads from Rest API with unstructured and hierarchical data and convert to relational format |
Date Conversions | Convert dates from string to various datetime formats and timezones using Field Type Converter and Expression Evaluator processors |
Drift Synchronization for Hive | Drift Synchronization from MySQL to the Cloudera distribution of Apache Hive and Apache Impala |
Hadoop FS to ADLS Gen2 | Load data from Hadoop FS to ADLS Gen 2 by performing some transformations |
ML - TensorFlow Binary Classification | Load a pre-trained TensorFlow model to classify cancer condition as either benign or malignant |
MySQL CDC to Delta Lake | Reads MySQL change data capture (CDC) data and writes to Databricks Delta Lake |
MySQL CDC to S3 to Snowflake | Reads MySQL change data capture (CDC) data, writes to S3 then reads from S3 and writes to Snowflake |
MySQL CDC to Snowflake | Reads MySQL change data capture (CDC) data and writes to Snowflake |
MySQL Schema Replication to Azure Synapse SQL | Bulk load data from MySQL into Azure Synapse SQL |
MySQL Schema replication to Delta Lake | Bulk load data from MySQL into Databricks Delta Lake |
MySQL binlog to DeltaLake | Reads MySQL binlog changed data and writes to Databricks Delta Lake |
NYC Taxi Ride Payment Type (Basic) | Reads data from a directory, process it, route it, mask sensitive data and write into another file system with a different data format |
NYC Taxi Ride Payment Type (with Jython) | Reads data from a directory, process it using Jython, route it, mask sensitive data and write into another file system with a different data format |
Oracle 19c Bulk Ingest and CDC to Databricks Delta Lake | Bulk ingest data from Oracle 19c and process Change Data Capture (CDC) into Databricks Delta Lake |
Oracle CDC to Delta Lake | Reads change data capture (CDC) data Oracle and writes to Databricks Delta Lake |
Oracle CDC to Snowflake | Reads change data capture (CDC) data Oracle and writes to Snowflake |
Parse Twitter Data to JSON | Parse raw Twitter data and store curated data in JSON format |
Parse Web Logs to JSON and Avro | Parse raw web logs ingested in Common Log Format and store curated data in JSON and Avro formats |
PostgreSQL CDC to Delta Lake | Reads change data capture (CDC) data from PostgreSQL and writes to Databricks Delta Lake |
PostgreSQL CDC to Snowflake | Reads change data capture (CDC) data from PostgreSQL and writes to Snowflake |
SQLServer CDC to Delta Lake | Reads change data capture (CDC) data from SQL Server and writes to Databricks Delta Lake |
SQLServer CDC to Snowflake | Reads change data capture (CDC) data from SQL Server and writes to Snowflake |
Salesforce CDC to Delta Lake | Reads change data capture (CDC) data from Salesforce and writes to Databricks Delta Lake |
Salesforce CDC to Snowflake | Reads change data capture (CDC) data from Salesforce and writes to Snowflake |
Salesforce to Delta Lake | Bulk load data from Salesforce accounts into Databricks Delta Lake |
Working with XML (Basic) | Read and process XML data in Data Collector |
aws-marketplace-reports | Bulk load data from Salesforce accounts into Databricks Delta Lake |
For any queries, questions, comments related to these pipelines reach out on any of these channels: