Skip to content

Latest commit

 

History

History
38 lines (28 loc) · 2.07 KB

readme.md

File metadata and controls

38 lines (28 loc) · 2.07 KB

Simple Kafka Enablement using StreamSets Data Collector

Creating custom Kafka producers and consumers is often a tedious process that requires manual coding. In this tutorial, we'll see how to use StreamSets Data Collector to create data ingest pipelines to write to Kafka using a Kafka Producer, and read from Kafka with a Kafka Consumer with no handwritten code.

Simple Kafka Enablement

Goals

The goal of this tutorial is read Avro files from a file system directory and write them to a Kafka topic using the StreamSets Kafka Producer. We'll then use a second pipeline configured with a Kafka Consumer to drain that topic, perform a set of transformations and send the data to two different destinations.

Prerequisites

  • A working instance of StreamSets Data Collector
  • A working Kafka instance (see the Quickstart for easy local setup. Last tested on version 1.1.0 but older and newer versions should work too.)
  • A copy of this tutorials directory containing the sample data

Our Setup

The tutorial sample data directory contains a set of compressed Avro files that contain simulated credit card transactions in the following JSON format:

{
  "transaction_date":"dd/mm/YYYY",
  "card_number":"0000-0000-0000-0000",
  "card_expiry_date":"mm/YYYY",
  "card_security_code":"0000",
  "purchase_amount":"$00.00",
  "description":"transaction description of the purchase"
}

Data Conversions

We will read Avro files from our source directory and write to Kafka using the Data Collector SDC Record data format. Then use another pipeline to read the SDC Record data from Kafka and write it to Elasticsearch and convert data to Avro for S3.

Let's Get Started