Skip to content

Kafka: distributing the load

Anton Zelenin edited this page Nov 4, 2021 · 6 revisions

A single pipeline can consume about 1000 eps. In case you have more than 1000 eps in your Kafka topic you need to split the load between pipelines so data could arrive at Anodot in real-time. You can do that by correctly partitioning the data inside the topic

Partitioning the Kafka topic

You can split data into different partitions and use multiple threads to consume the data Kafka partitions

Managing the order of Kafka data records when streaming to Anodot

Kafka data records are converted to data points in Anodot metrics. Data points are processed in the order they arrive at Anodot - out-of-order data points (in the context of the same metric) are discarded.

Kafka guarantees the order of records within a partition. To enable ordered processing of the Kafka records you need to make sure that:

  • The number of partitions is larger than or equal to the number of threads, thus each thread is handling a single partition or more, resulting in ordered handling of the records

  • The producers of a given combination of measurement and dimensions are storing such records to the same partition.

  • You do not use the transformations feature because changing metrics after fetching them from Kafka may affect the ordering

The order of messages arrival to Anodot from a Kafka topic: Kafka order

For additional information on consumers and ordering, please refer to Kafka documentation

Clone this wiki locally