Skip to content
Apache Kafka introduction 3 min read

What is Apache Kafka?

Apache Kafka is a distributed event streaming platform built around an immutable, append-only commit log. Producers write events to the log, the log durably retains them, and any number of consumers read those events independently at their own pace. This deceptively simple design scales to millions of events per second and underpins the data backbones of companies like LinkedIn (where it was created), Netflix, and Uber.

A log, not a queue

The mental shift that unlocks Kafka is this: it is a log, not a queue. In a traditional queue, a message is consumed and deleted. In Kafka, events are appended to the log and retained for a configured period regardless of who has read them. Each consumer tracks its own position (offset) in the log, so multiple independent consumers can replay the same stream — for analytics, search indexing, auditing, and more — all from one source of truth.

Core use cases

  • Event-driven microservices — services communicate by publishing and subscribing to events instead of synchronous calls.
  • Real-time stream processing — fraud detection, recommendations, and metrics computed as data arrives.
  • Data integration pipelines — moving data between databases, warehouses, and search systems via Kafka Connect.
  • Log and metrics aggregation — a unified, high-throughput pipeline for observability data.
  • Event sourcing — the log itself becomes the system of record.

The architectural model

A Kafka cluster is made of brokers (servers). Events are organized into topics, and each topic is split into ordered partitions that are distributed across brokers for scale and replicated for fault tolerance. Producers write to topics; consumers, organized into consumer groups, read from them.

                    +-----------------------------------+
   Producers        |          Kafka Cluster            |       Consumers
   +--------+        |                                   |     +-----------+
   | App A  |---->   |  Topic: orders                    |     | Group X   |
   +--------+        |   +------ Partition 0 -------+     |     | (svc 1,2) |
   +--------+        |   | e0 | e1 | e2 | e3 | ...   |--->|---> +-----------+
   | App B  |---->   |   +-------------------------+      |
   +--------+        |   +------ Partition 1 -------+     |     +-----------+
                     |   | e0 | e1 | e2 | ...       |--->|---> | Group Y   |
                     |   +-------------------------+      |     | (analytics)|
                     |   (partitions spread across        |     +-----------+
                     |    brokers, each replicated)       |
                     +-----------------------------------+

Every partition is an ordered, immutable sequence. Each event in it has a monotonically increasing offset. Ordering is guaranteed within a partition, and partitions enable horizontal scaling because different consumers in a group read different partitions in parallel.

Kafka vs traditional message queues

AspectApache KafkaTraditional message queue
Storage modelDurable, replayable logTransient queue
Message retentionTime/size-based; survives consumptionDeleted once acknowledged
ConsumptionMany independent consumers, each with own offsetTypically one consumer per message
ReplayYes — reset offset to re-read historyGenerally no
ThroughputVery high (millions/sec)Moderate
OrderingGuaranteed per partitionPer queue, often weaker under scale
Scaling unitPartitionsQueues / prefetch tuning

Kafka is not a drop-in replacement for every queue. If you need per-message priority, fine-grained TTLs, or complex routing, a broker like RabbitMQ may fit better. Kafka shines when you need durable, replayable, high-throughput event streams.

Modern Kafka also runs without ZooKeeper thanks to KRaft mode, which moves cluster metadata management into Kafka itself — simplifying operations significantly. The next page shows how to get a cluster running locally.

Last updated June 1, 2026
Was this helpful?