This demo showcases a AWS Kinesis -> Confluent Cloud -> cloud storage
pipeline.
You have freedom of choice in selecting the cloud storage provider that is right for your business needs.
Benefits of Confluent Cloud:
- Build business applications on a full event streaming platform
- Span multiple cloud providers (AWS, GCP, Azure) and on-prem datacenters
- Use Kafka to aggregate data in single source of truth
- Harness the power of KSQL
This demo showcases an entire end-to-end streaming ETL deployment, built for 100% cloud services. It is built on the Confluent Platform, including:
- Kinesis source connector: reads from a Kinesis stream and writes the data to a Kafka topic
- KSQL: streaming SQL engine that enables real-time data processing against Kafka
- GCS or S3 sink connector: pushes data from Kafka topics to cloud storage
Component | Consumes From | Produces To |
---|---|---|
Kinesis source connector | Kinesis stream s1 |
locations |
KSQL | locations |
KSQL streams and tables (ksql.commands) |
GCS (or S3) sink connector | KSQL tables COUNT_PER_CITY , SUM_PER_CITY |
GCS (or S3) |
As with the other demos in this repo, you may run the entire demo end-to-end with ./start.sh
, and it runs on your local Confluent Platform install. This requires the following:
- Common demo prerequisites
- Confluent Platform 5.3
- An initialized Confluent Cloud cluster used for development only
- AWS:
aws cli
, properly initialized with your credentials - GCS:
gsutils
, properly initialized with your credentials jq
curl
-
Configure the cloud storage provider and other demo parameters in the
config/demo.cfg
file. In particular, be sure to configure theDESTINATION_STORAGE
parameter appropriately for Google GCS or AWS S3, and set the appropriate region. -
Run the demo:
$ ./start.sh
- View all the Kinesis, Kafka, and cloud storage data after running the demo:
$ ./read-data.sh
- Stop the demo and clean up:
$ ./stop.sh