Skip to content

rajbarua/gids

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

GIDS 2024 Demo

Supercharging Kafka Applications with Real-time Contextual Insights

Setup

  1. Run kafka docker pull apache/kafka:3.7.0 and then docker run -d -p 9092:9092 --name kafka apache/kafka:3.7.0
  2. Start Hazelcast hz start. See installation guidelines. If you use EE then you need to export HZ_LICENSEKEY=$(cat ~/hazelcast/demo.license).
  3. Also run Management Centre as hz-mc start
  4. Configure CLC by adding a config clc config add dev cluster.name=dev cluster.address=localhost:5701
  5. Connect to clc using clc -c dev
  6. Optional: Exec into the container docker exec -it kafka bash and cd opt/kafka/bin
  7. Optional: Follow the same steps in any SQL client, like DBeaver

Demo

Goal is to read from a Kafka topic which is emitting a json with an e-commerce data stream. Each message contains - id, customer, price, order_ts and amount.

  1. First we need to create a Mapping to that incoming messages
CREATE OR REPLACE MAPPING orders (
    id BIGINT,
    customer VARCHAR,
    price DECIMAL,
    order_ts TIMESTAMP,
    amount BIGINT)
TYPE KAFKA
OPTIONS(
    'valueFormat'='json-flat',
    'bootstrap.servers' = '127.0.0.1:9092'
);
  1. Let's start reading from the topic
SELECT customer AS customer, ROUND(price,2) AS Price, amount AS "Sold" FROM orders;

Nothing shows. The issue is that topic has no data as there are no producers. So lets produce some random data and push into the topic.

  1. Randomly create data and push into a Kafka topic (only to read again to process that data back inside Hazelcast). Here is a sample continuous query that will form the base of our data generator. It uses a built stream generator.
SELECT v as id,
           RAND(v*v) as userRand,
           TO_TIMESTAMP_TZ(v*10 + 1645484400000) as order_ts,
           ROUND(RAND()*100, 0) as amount
     FROM TABLE(generate_stream(100));

We can now create a Hazelcast JOB that will generate and push sample data into kafka topic called orders.

CREATE JOB IF NOT EXISTS push_orders
    OPTIONS (
    'processingGuarantee' = 'exactlyOnce',
    'snapshotIntervalMillis' = '5000')
    AS
    SINK INTO orders (
        SELECT id,
        CASE WHEN userRand BETWEEN 0 AND 0.1 THEN 'Elmer Fudd'
                WHEN userRand BETWEEN 0.1 AND 0.2 THEN 'Speedy Gonzales'
                WHEN userRand BETWEEN 0.2 AND 0.3 THEN 'Daffy'
                WHEN userRand BETWEEN 0.3 AND 0.4 THEN 'Sylvester'
                WHEN userRand BETWEEN 0.4 AND 0.5 THEN 'Porky'
                WHEN userRand BETWEEN 0.5 AND 0.6 THEN 'Tweety'
                WHEN userRand BETWEEN 0.6 AND 0.7 THEN 'Bugs Bunny'
                WHEN userRand BETWEEN 0.7 AND 0.8 THEN 'Marvin Martian'
                ELSE 'Wile E. Coyote'
        END as customer,
        CASE WHEN userRand BETWEEN 0 and 0.1 then userRand*50+1
                WHEN userRand BETWEEN 0.1 AND 0.2 THEN userRand*75+.6
                WHEN userRand BETWEEN 0.2 AND 0.3 THEN userRand*60+.2
                WHEN userRand BETWEEN 0.3 AND 0.4 THEN userRand*30+.3
                WHEN userRand BETWEEN 0.4 AND 0.5 THEN userRand*43+.7
                WHEN userRand BETWEEN 0.5 AND 0.6 THEN userRand*100+.4
                WHEN userRand BETWEEN 0.6 AND 0.7 THEN userRand*25+.8
                WHEN userRand BETWEEN 0.6 AND 0.7 THEN userRand*80+.5
                WHEN userRand BETWEEN 0.7 AND 0.8 THEN userRand*10+.1
                ELSE userRand*100+4
        END as price,
        order_ts,
        amount
        FROM
            (SELECT v as id,
                RAND(v*v) as userRand,
                TO_TIMESTAMP_TZ(v*10 + 1645484400000) as order_ts,
                ROUND(RAND()*100, 0) as amount
            FROM TABLE(generate_stream(100)))
    );
  1. Let's query the topic again
SELECT customer AS customer, ROUND(price,2) AS Price, amount AS "Sold" FROM orders;

Limit the output to one customer.

SELECT customer AS customer, ROUND(price,2) AS Price, amount AS "Sold"
FROM orders
WHERE customer = 'Wile E. Coyote';

Limit the output to one symbol and sales of over 50 shares.

SELECT customer AS customer, ROUND(price,2), amount AS "Sold"
FROM orders
WHERE customer = 'Daffy' AND amount > 50;
  1. Create data to enrich the streaming data from kafka topic In order to demonstrate enrichment of streaming data, We can now use Hazelcast IMap to enrich incoming data. Let us create a mapping to a Hazelcast Map that will store store long as key and json as value
CREATE or REPLACE MAPPING enrich (
    __key BIGINT,
    customer VARCHAR,
    colour1 VARCHAR,
    colour2 VARCHAR,
    colour3 VARCHAR )
TYPE IMap
OPTIONS (
    'keyFormat'='bigint',
    'valueFormat'='json-flat');

Let's add some data into into this Map that we will use for enriching the messages on the topic.

INSERT INTO enrich VALUES
(1, 'Elmer Fudd', 'blue','red','green'),
(2, 'Speedy Gonzales', 'green', 'blue', 'blue'),
(3, 'Daffy', 'blue','green', 'blue'),
(4, 'Sylvester', 'blue','blue', 'blue'),
(5, 'Porky', 'blue', 'red', 'green'),
(6, 'Tweety', 'blue', 'red', 'blue'),
(7, 'Bugs Bunny', 'green', 'red', 'blue'),
(8, 'Marvin Martian', 'green', 'red', 'blue'),
(9, 'Wile E. Coyote', 'blue','green','red');
  1. Let's enrich the stream with lookup data inside Hazelcast Following SQL joins the streaming data from kafka and IMap stored in Hazelcast
SELECT
    orders.customer AS Symbol,
    enrich.colour1 as colour1,
    enrich.colour2 as colour2,
    enrich.colour3 as colour3,
     ROUND(orders.price,2) AS Price,
     orders.amount AS "Sold"
FROM orders
JOIN enrich
ON enrich.customer = orders.customer 
AND enrich.colour2 = 'red';
  1. While we can run this as an ad-hoc continuous query but we can also run as a job to store that into an IMap First create a Mapping to this new IMap
CREATE or REPLACE MAPPING destination_map(
    __key BIGINT,
    customer VARCHAR,
    colour1 VARCHAR,
    colour2 VARCHAR,
    colour3 VARCHAR,
    price DECIMAL,
    amount BIGINT  )
TYPE IMap
OPTIONS (
    'keyFormat'='bigint',
    'valueFormat'='json-flat');

Now lets push data into an IMap

CREATE JOB IF NOT EXISTS push_orders_imap
AS
    SINK INTO destination_map (
        SELECT
            orders.id AS Id,
            orders.customer AS Symbol,
            enrich.colour1 as colour1,
            enrich.colour2 as colour2,
            enrich.colour3 as colour3,
            ROUND(orders.price,2) AS Price,
            orders.amount AS "Sold"
        FROM orders
        JOIN enrich
        ON enrich.customer = orders.customer 
        AND enrich.colour2 = 'red'
    );

Cancel job even

ALTER JOB push_orders_imap SUSPEND;
  1. We can also join two streams if needed First we create one more view of the order view
CREATE OR REPLACE VIEW orders_ordered AS
SELECT *
FROM TABLE(IMPOSE_ORDER(
    TABLE orders,
    DESCRIPTOR(order_ts),
    INTERVAL '0.5' SECONDS));
CREATE OR REPLACE VIEW high_low AS
SELECT
     window_start,
     window_end,
     customer,
     ROUND(MAX(price),2) AS high,
     ROUND(MIN(price),2) AS low
FROM TABLE(TUMBLE(
     TABLE orders_ordered,
     DESCRIPTOR(order_ts),
     INTERVAL '5' SECONDS
))
GROUP BY 1,2,3;
SELECT
     tro.customer AS Customer,
     tro.price AS Price,
     hl.high AS High,
     hl.low AS Low
FROM orders_ordered AS tro
JOIN high_low AS hl
ON tro.customer= hl.customer
AND hl.window_end BETWEEN tro.order_ts AND tro.order_ts + INTERVAL '0.1' SECONDS;

Similarity Search

We will use a local Hazelcast deployment and MC along with Python virtual environment. We will use pyenv to manage multiple python version on the local machine. For this demo we will use Python 3.11.

Installations

This guide is written for local installation of Hazelcast but it can be extended for cloud and other installations.

  1. Make sure Hazelcast is installed via brew or other ways
  2. Install pyenv using brew install pyenv and echo 'eval "$(pyenv init -)"' >> ~/.zshrc
  3. Install python 3.11 latest as pyenv install 3.11
  4. Change into project directory and switch to python 3.7 using cd ~/src/gids; pyenv local 3.11
  5. Then create a pythin virtual environment and activate it python -m venv .venv, source .venv/bin/activate
  6. Install Python dependencies pyenv exec pip install -U ipykernel qdrant-client sentence-transformers hazelcast-python-client jupyter
  7. Run qdrant docker run -p 6333:6333 -v $(pwd)/qdrant_storage:/qdrant/storage qdrant/qdrant
  8. Start Hazelcast hz start and optionally Management Centre hz-mc start
  9. Switch to Notebook jupyter notebook

References

  1. Blog

About

GIDS 2024

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published