Apache Kafka introduction 3 min read

What is Apache Kafka?

Apache Kafka is a distributed event streaming platform built around an immutable, append-only commit log. Producers write events to the log, the log durably retains them, and any number of consumers read those events independently at their own pace. This deceptively simple design scales to millions of events per second and underpins the data backbones of companies like LinkedIn (where it was created), Netflix, and Uber.

A log, not a queue

The mental shift that unlocks Kafka is this: it is a log, not a queue. In a traditional queue, a message is consumed and deleted. In Kafka, events are appended to the log and retained for a configured period regardless of who has read them. Each consumer tracks its own position (offset) in the log, so multiple independent consumers can replay the same stream — for analytics, search indexing, auditing, and more — all from one source of truth.

Core use cases

Event-driven microservices — services communicate by publishing and subscribing to events instead of synchronous calls.
Real-time stream processing — fraud detection, recommendations, and metrics computed as data arrives.
Data integration pipelines — moving data between databases, warehouses, and search systems via Kafka Connect.
Log and metrics aggregation — a unified, high-throughput pipeline for observability data.
Event sourcing — the log itself becomes the system of record.

The architectural model

A Kafka cluster is made of brokers (servers). Events are organized into topics, and each topic is split into ordered partitions that are distributed across brokers for scale and replicated for fault tolerance. Producers write to topics; consumers, organized into consumer groups, read from them.

                    +-----------------------------------+
   Producers        |          Kafka Cluster            |       Consumers
   +--------+        |                                   |     +-----------+
   | App A  |---->   |  Topic: orders                    |     | Group X   |
   +--------+        |   +------ Partition 0 -------+     |     | (svc 1,2) |
   +--------+        |   | e0 | e1 | e2 | e3 | ...   |--->|---> +-----------+
   | App B  |---->   |   +-------------------------+      |
   +--------+        |   +------ Partition 1 -------+     |     +-----------+
                     |   | e0 | e1 | e2 | ...       |--->|---> | Group Y   |
                     |   +-------------------------+      |     | (analytics)|
                     |   (partitions spread across        |     +-----------+
                     |    brokers, each replicated)       |
                     +-----------------------------------+

Every partition is an ordered, immutable sequence. Each event in it has a monotonically increasing offset. Ordering is guaranteed within a partition, and partitions enable horizontal scaling because different consumers in a group read different partitions in parallel.

Kafka vs traditional message queues

Aspect	Apache Kafka	Traditional message queue
Storage model	Durable, replayable log	Transient queue
Message retention	Time/size-based; survives consumption	Deleted once acknowledged
Consumption	Many independent consumers, each with own offset	Typically one consumer per message
Replay	Yes — reset offset to re-read history	Generally no
Throughput	Very high (millions/sec)	Moderate
Ordering	Guaranteed per partition	Per queue, often weaker under scale
Scaling unit	Partitions	Queues / prefetch tuning

Kafka is not a drop-in replacement for every queue. If you need per-message priority, fine-grained TTLs, or complex routing, a broker like RabbitMQ may fit better. Kafka shines when you need durable, replayable, high-throughput event streams.

Modern Kafka also runs without ZooKeeper thanks to KRaft mode, which moves cluster metadata management into Kafka itself — simplifying operations significantly. The next page shows how to get a cluster running locally.