- A topic is a category or feed name to which records are published. Topics in Kafka are always multi-subscriber; that is, a topic can have zero, one, or many consumers that subscribe to the data written to it.
- Partition is a shard of topic. Each partition is an ordered, immutable sequence of records that is continually appended to—a structured commit log. The records in the partitions are each assigned a sequential id number called the offset that uniquely identifies each record within the partition
- Kafka replicates the log for each topic's partitions across a configurable number of servers (you can set this replication factor on a topic-by-topic basis). This allows automatic failover to these replicas when a server in the cluster fails so messages remain available in the presence of failures.
- Distributed key-value storage. Zookeeper is used for metadata storing and cluster processing.
- brokers info is stored in zookeper
- get list of broker's IDs:
zookeeper-shell zookeeper:2181 ls /brokers/ids
- get detailed info:
zookeeper-shell zookeeper:2181 get /brokers/ids/1
- topics info is stored in zookeeper either
- topics themself are stored in directory according
/etc/kafka/kafka.properties
log.dirs=/var/lib/kafka/data
kafka-topics --zookeeper zookeeper:2181 --list
kafka-topics --bootstrap-server broker:9092 --list
kafka-topics --bootstrap-server broker:9092 --describe --topic <topic>
- Time based retention
- Once the configured retention time has been reached for Segment, it is marked for deletion or compaction depending on configured cleanup policy. Default retention period for Segments is 7 days.
log.retention.ms
log.retention.minutes
log.retention.hours
- Once the configured retention time has been reached for Segment, it is marked for deletion or compaction depending on configured cleanup policy. Default retention period for Segments is 7 days.
- Size Based Retention:
- In this policy, we configure the maximum size of a Log data structure for a Topic partition. Once Log size reaches this size, it starts removing Segments from its end.
log.retention.bytes
- consumer group - is group of related consumers that perform the same task.
- Each consumer in group read own partition.
- Two consumers in one group can't read the same partition.
- Different consumer groups can read from different offsets in a partition
- The offsets for groups are stored in zookeeper.
- list of consumer groups
kafka-consumer-groups --bootstrap-server localhost:9092 --list
- list of offsets
kafka-consumer-groups --bootstrap-server localhost:9092 --describe --group <group> --offsets
- reset offsets
kafka-consumer-groups --bootstrap-server localhost:9092 --reset-offsets --group <group>
- `kafka-consumer-groups --bootstrap-server localhost:9092 --describe --group <group>`
How many consumers can read from one topic in the same consumer group? What happens if you run more consumers than there's partitions for the topic?
- ideally one consumer for one partition in a topic.
- One consumer can read several partitions in one topic.
- it there are more consumers the topics then some consumers will get no data.
- only one
- if one of consumers dies Kafka will do reblance.
- Keys are used when records are to be written to partitions in a more controlled manner. The simplest such scheme is to generate a consistent hash of the key, and then select the partition number for that record by taking the result of the hash modulo, the total number of partitions in the topic. This assures that records with the same key are always written to the same partition.
- The key is particularly important if modeling a Kafka topic as a table in KSQL (or KTable in Kafka Streams) for query or join purposes.
- according to hash of key