Apache Kafka core 3 min read

Topics, Partitions & Offsets

Topics, partitions, and offsets are the foundation of everything Kafka does. Understanding how they interact explains Kafka’s scalability, its ordering guarantees, and the subtle trade-offs you make when choosing partition counts and keys.

Topics and partitions

A topic is a named stream of events — think orders or user.signups. A topic is divided into one or more partitions, and each partition is an independent, ordered, append-only log living on a broker. Partitions are the unit of parallelism and scale: a topic with 12 partitions can be consumed by up to 12 consumers in a group simultaneously.

Topic "orders"
  Partition 0:  [ off0 ][ off1 ][ off2 ][ off3 ]  --> appended right
  Partition 1:  [ off0 ][ off1 ]
  Partition 2:  [ off0 ][ off1 ][ off2 ]

Offsets

An offset is the position of an event within its partition — a monotonically increasing integer assigned at write time. Offsets are unique per partition (partition 0’s offset 5 is unrelated to partition 1’s offset 5). Consumers track which offset they have processed, and commit it back to Kafka so they can resume after a restart. Because Kafka retains events, a consumer can also rewind by resetting its offset to replay history.

Replication and fault tolerance

Each partition has a replication factor — the number of copies kept across brokers. One replica is the leader (handles all reads and writes); the others are followers that stay in sync. The set of replicas caught up with the leader is the in-sync replica (ISR) set. If the leader’s broker fails, an in-sync follower is promoted automatically.

Concept	Meaning
Replication factor	Total copies of each partition (e.g. 3)
Leader	Replica serving reads/writes for the partition
Follower	Replica that replicates the leader
ISR	Replicas fully caught up with the leader

A production rule of thumb is replication factor 3 with min.insync.replicas=2. This tolerates one broker failure while still acknowledging writes durably.

Ordering guarantees

Kafka guarantees ordering within a single partition — never across partitions. Events in partition 0 are delivered in offset order; events spread across partitions 0, 1, and 2 have no global order. This is the single most important property to internalize when designing topics.

If you need all events for a given entity processed in order, you must route them to the same partition. Kafka does this for you via keys.

Keys and partitioning

When a producer sends an event with a key, Kafka hashes the key to choose a partition: partition = hash(key) % partitionCount. All events with the same key land in the same partition, preserving their relative order.

key="user-42" -> hash -> Partition 1   (always)
key="user-99" -> hash -> Partition 0   (always)

A common pattern is keying order events by customerId so every customer’s events stay ordered, while different customers spread across partitions for throughput. Events sent with a null key are distributed across partitions (round-robin / sticky batching) for even load.

Changing a topic’s partition count later breaks key-to-partition stability, because the modulo changes. Choose partition counts deliberately up front.

Consumer groups and rebalancing

A consumer group is a set of consumers cooperating to read a topic. Kafka assigns each partition to exactly one consumer in the group, so adding consumers (up to the partition count) increases parallelism. When a consumer joins or leaves, Kafka triggers a rebalance to redistribute partitions.

Topic with 3 partitions, group "billing" with 2 consumers:
  Consumer A  <-  Partition 0, Partition 2
  Consumer B  <-  Partition 1

Adding a third consumer gives each one partition; a fourth would sit idle, since partitions cannot be shared within a group.

Best Practices

Pick partition counts for your target throughput and max consumer parallelism — it is hard to reduce later.
Use meaningful keys to preserve per-entity ordering; avoid keys that funnel traffic onto one partition (hot partitions).
Set replication factor 3 and min.insync.replicas=2 in production for durability.
Keep consumers in a group at or below the partition count so none sit idle.
Commit offsets only after successful processing to avoid silent data loss.
Minimize rebalances with static membership and tuned session timeouts in latency-sensitive groups.