Skip to content

pmitra43/kafka_load_generator

Repository files navigation

Kafka Load Generator

Json data generator to generate log of nodes in a cluster per second with rule based anomalies(>80%)

kafka_load_generator can be used to generate stub data of cpu log of multiple nodes in a cluster on a per second basis and send the data to a kafka broker. Learn more about kafka here.

The data generated does not represent actual cpu usage, but generates data which has some continuous sequences of 9 and 10 seconds of cpu usage of a single node above 80%. The data generated will contain cpu usage from 1 to 100.

  • Some data will be <= 80, which is assumed to be normal data.
  • Some data will be > 80, which is considered anomalous.

For testing purposes, all the sequences will not be of 10 seconds. Few will be of 9 seconds.

  • If the first second modulo 20 is 0, then that sequence will have 10 seconds of anomalous data.
  • Otherwise(i.e. first second modulo 20 is 10), the sequence will have 9 seconds.

Anomalous data will be generated in sequences of 9 secs or 10 secs, which might be consecutive as well. The gap between two sequences is random.

json-data-generator is used here to generate the json and publish them to a kafka broker. kafka_load_generator can generate stub cpu logs for multiple nodes of the same cluster. If scaling up the number of nodes creates performance issue on one machine, multiple instances of this generator can be started on various machines after configuring the clusterConfig.json's firstNode and lastNode.

Example:

If you want to generate data for 2000 nodes belonging to a same cluster, but the machine you are using can generate only upto 1000 points per second before creating performance issues, two machines can be used to achieve the same. The clusterConfig files will look like the following. Notice the firstNode and lastNode:

Machine 1
{
  "topic": "cpu-usage",

  "broker": {
    "server": "127.0.0.1",
    "port": 9092
  },

  "cluster": {
    "clusterName": "firstCluster",
    "firstNode": 1,
    "lastNode": 1000
  }
}
Machine 2
{
  "topic": "cpu-usage",

  "broker": {
    "server": "127.0.0.1",
    "port": 9092
  },

  "cluster": {
    "clusterName": "firstCluster",
    "firstNode": 1001,
    "lastNode": 2000
  }
}

Cluster Configuration

clusterConfig is the main configuration file which contains:

  • topic - Name of the Kafka topic to publish to.
  • Kafka broker's details:
Variable Definition
server The IP address of the kafka broker server(also referred to as bootstrap server). Currently only one broker server can be given as input.
port The port of the kafka broker server. This is the port on which the kafka server is listening to. Only one port can be given as input.
  • Details of the cluster
Variable Definition
clusterName Name of the cluster to which all the nodes belong to. It will be same for all the nodes.
firstNode Identity number of the first node for which data needs to be generated
lastNode Identity number of the last node for which data needs to be generated

To execute:

  1. Edit clusterConfig.json according to the details mentioned above.
  2. Run python3 configGenerator.py in terminal. If there are no errors, it will generate two types of json files inside json-data-generator/conf.
  • cpuUsageConfig_clusterName.json
  • n instances of cpuUsageWorkflow_clusterName_nodeNumber.json where n is (lastNode-firstNode+1) and nodeNumber ranges from firstNode to lastNode.
  1. Run java -jar json-data-generator-1.2.2-SNAPSHOT/json-data-generator-1.2.2-SNAPSHOT.jar cpuUsageConfig_*clusterName*.json

Note: Kafka broker should be running before executing the jar.

Prerequisites:

  1. Python3
  2. Java8

To do:

  • Script to take broker and cluster details as arguments, and start producer.
  • Containerize kafka_load_generator.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages