What is Apache Kafka?
Apache Kafka is a distributed event streaming platform built around an immutable, append-only commit log. Producers write events to the log, the log durably retains them, and any number of consumers read those events independently at their own pace. This deceptively simple design scales to millions of events per second and underpins the data backbones of companies like LinkedIn (where it was created), Netflix, and Uber.
A log, not a queue
The mental shift that unlocks Kafka is this: it is a log, not a queue. In a traditional queue, a message is consumed and deleted. In Kafka, events are appended to the log and retained for a configured period regardless of who has read them. Each consumer tracks its own position (offset) in the log, so multiple independent consumers can replay the same stream — for analytics, search indexing, auditing, and more — all from one source of truth.
Core use cases
- Event-driven microservices — services communicate by publishing and subscribing to events instead of synchronous calls.
- Real-time stream processing — fraud detection, recommendations, and metrics computed as data arrives.
- Data integration pipelines — moving data between databases, warehouses, and search systems via Kafka Connect.
- Log and metrics aggregation — a unified, high-throughput pipeline for observability data.
- Event sourcing — the log itself becomes the system of record.
The architectural model
A Kafka cluster is made of brokers (servers). Events are organized into topics, and each topic is split into ordered partitions that are distributed across brokers for scale and replicated for fault tolerance. Producers write to topics; consumers, organized into consumer groups, read from them.
+-----------------------------------+
Producers | Kafka Cluster | Consumers
+--------+ | | +-----------+
| App A |----> | Topic: orders | | Group X |
+--------+ | +------ Partition 0 -------+ | | (svc 1,2) |
+--------+ | | e0 | e1 | e2 | e3 | ... |--->|---> +-----------+
| App B |----> | +-------------------------+ |
+--------+ | +------ Partition 1 -------+ | +-----------+
| | e0 | e1 | e2 | ... |--->|---> | Group Y |
| +-------------------------+ | | (analytics)|
| (partitions spread across | +-----------+
| brokers, each replicated) |
+-----------------------------------+
Every partition is an ordered, immutable sequence. Each event in it has a monotonically increasing offset. Ordering is guaranteed within a partition, and partitions enable horizontal scaling because different consumers in a group read different partitions in parallel.
Kafka vs traditional message queues
| Aspect | Apache Kafka | Traditional message queue |
|---|---|---|
| Storage model | Durable, replayable log | Transient queue |
| Message retention | Time/size-based; survives consumption | Deleted once acknowledged |
| Consumption | Many independent consumers, each with own offset | Typically one consumer per message |
| Replay | Yes — reset offset to re-read history | Generally no |
| Throughput | Very high (millions/sec) | Moderate |
| Ordering | Guaranteed per partition | Per queue, often weaker under scale |
| Scaling unit | Partitions | Queues / prefetch tuning |
Kafka is not a drop-in replacement for every queue. If you need per-message priority, fine-grained TTLs, or complex routing, a broker like RabbitMQ may fit better. Kafka shines when you need durable, replayable, high-throughput event streams.
Modern Kafka also runs without ZooKeeper thanks to KRaft mode, which moves cluster metadata management into Kafka itself — simplifying operations significantly. The next page shows how to get a cluster running locally.