Vortex is a real-time, distributed, fault-tolerant, highly-scalable, rapid-fast data stream processing software.
It can consume data from apache Kafka topics on-the-fly, process it using apache spark, including basic processing as well as statistical ML workloads, and stream it to an apache Ignite cluster to store as an in-memory data-grid, which can be persisted to disk as required.
Following are the steps to setup a minimal example with sample e-commerce data:
-
First run the ignite server
-
Then run the kafka zookeeper and the kafka server
-
Then run the kafka producer to generate the event stream
-
Then run the main spark app using:
vortex-venv/bin/spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.0 spark-app/spark_main.py