Topics, Partitions & Offsets
Topics, partitions, and offsets are the foundation of everything Kafka does. Understanding how they interact explains Kafka’s scalability, its ordering guarantees, and the subtle trade-offs you make when choosing partition counts and keys.
Topics and partitions
A topic is a named stream of events — think orders or user.signups. A topic is divided into one or more partitions, and each partition is an independent, ordered, append-only log living on a broker. Partitions are the unit of parallelism and scale: a topic with 12 partitions can be consumed by up to 12 consumers in a group simultaneously.
Topic "orders"
Partition 0: [ off0 ][ off1 ][ off2 ][ off3 ] --> appended right
Partition 1: [ off0 ][ off1 ]
Partition 2: [ off0 ][ off1 ][ off2 ]
Offsets
An offset is the position of an event within its partition — a monotonically increasing integer assigned at write time. Offsets are unique per partition (partition 0’s offset 5 is unrelated to partition 1’s offset 5). Consumers track which offset they have processed, and commit it back to Kafka so they can resume after a restart. Because Kafka retains events, a consumer can also rewind by resetting its offset to replay history.
Replication and fault tolerance
Each partition has a replication factor — the number of copies kept across brokers. One replica is the leader (handles all reads and writes); the others are followers that stay in sync. The set of replicas caught up with the leader is the in-sync replica (ISR) set. If the leader’s broker fails, an in-sync follower is promoted automatically.
| Concept | Meaning |
|---|---|
| Replication factor | Total copies of each partition (e.g. 3) |
| Leader | Replica serving reads/writes for the partition |
| Follower | Replica that replicates the leader |
| ISR | Replicas fully caught up with the leader |
A production rule of thumb is replication factor
3withmin.insync.replicas=2. This tolerates one broker failure while still acknowledging writes durably.
Ordering guarantees
Kafka guarantees ordering within a single partition — never across partitions. Events in partition 0 are delivered in offset order; events spread across partitions 0, 1, and 2 have no global order. This is the single most important property to internalize when designing topics.
If you need all events for a given entity processed in order, you must route them to the same partition. Kafka does this for you via keys.
Keys and partitioning
When a producer sends an event with a key, Kafka hashes the key to choose a partition: partition = hash(key) % partitionCount. All events with the same key land in the same partition, preserving their relative order.
key="user-42" -> hash -> Partition 1 (always)
key="user-99" -> hash -> Partition 0 (always)
A common pattern is keying order events by customerId so every customer’s events stay ordered, while different customers spread across partitions for throughput. Events sent with a null key are distributed across partitions (round-robin / sticky batching) for even load.
Changing a topic’s partition count later breaks key-to-partition stability, because the modulo changes. Choose partition counts deliberately up front.
Consumer groups and rebalancing
A consumer group is a set of consumers cooperating to read a topic. Kafka assigns each partition to exactly one consumer in the group, so adding consumers (up to the partition count) increases parallelism. When a consumer joins or leaves, Kafka triggers a rebalance to redistribute partitions.
Topic with 3 partitions, group "billing" with 2 consumers:
Consumer A <- Partition 0, Partition 2
Consumer B <- Partition 1
Adding a third consumer gives each one partition; a fourth would sit idle, since partitions cannot be shared within a group.
Best Practices
- Pick partition counts for your target throughput and max consumer parallelism — it is hard to reduce later.
- Use meaningful keys to preserve per-entity ordering; avoid keys that funnel traffic onto one partition (hot partitions).
- Set replication factor
3andmin.insync.replicas=2in production for durability. - Keep consumers in a group at or below the partition count so none sit idle.
- Commit offsets only after successful processing to avoid silent data loss.
- Minimize rebalances with static membership and tuned session timeouts in latency-sensitive groups.