Home Services Blog Kafka
Tutorial · Data Engineering
~12 min read

Apache Kafka Architecture,
explained in plain English

Kafka can feel intimidating. Brokers, partitions, offsets, replicas, consumer groups - you can read about each one separately and still not see how they connect. Every piece is there to solve one problem: moving data from the systems that produce it to the systems that need it, reliably and at scale. We'll work through the whole thing using one example: an online store processing orders.

producer topic partition offset broker replication consumer group leader / follower

01 The big picture

Imagine a busy online store. Thousands of customers place orders every second. Each order needs to trigger several things: a fraud check, an inventory update, a confirmation email, and an entry in the analytics dashboard.

In a traditional setup, the Order Service would call each of those systems directly, one after another, and wait for every reply. If the email server is slow, the whole order slows down. If one service is down, the order fails. This is called tight coupling, and it does not scale.

Kafka sits in the middle. The Order Service writes the order once and moves on. Every downstream service reads from Kafka at its own pace - they don't coordinate, and none of them wait.

PRODUCERS · push ▸ Order Service writes order events Payment Service writes transactions IoT Sensor writes readings Kafka Cluster brokers · topics · partitions topic: orders P0 · P1 · P2 topic: payments P0 · P1 topic: sensor-data P0 · P1 · P2 · P3 stored on disk · replicated CONSUMERS · ↑ pull Fraud Service reads orders Inventory Service reads orders Analytics reads everything
producers cluster consumers
The whole architecture in one line: producers push → cluster stores → consumers pull.

The producer doesn't know consumers exist. Consumers don't know about each other. That one design decision is why the whole system scales without everything grinding to a halt.

02 Producers - who sends the data

A producer is any app that writes data into Kafka - a mobile app, a payment service, a sensor. Each thing it sends is called an event (or a message / record).

When a customer places order ORD-8821, the Order Service creates an event like this:

{
  "event":   "order.placed",
  "orderId": "ORD-8821",
  "userId":  "U-4491",
  "total":   118.99,
  "currency": "GBP",
  "ts":      "2026-05-27T09:14:02Z"
}
Real-life analogy A producer is a passenger boarding a bus. They get on, hand over their ticket (the event), and the bus takes care of the rest. The passenger does not need to know who else is on the bus or where everyone is getting off.
✓ Pro

The producer fires once and forgets. It returns to the customer in milliseconds instead of waiting for five services.

How it works

The producer library handles serialization, choosing a partition, batching, and retries automatically.

✗ Without it

Direct calls to 5 services mean 5 chances to fail and latency that stacks up on every order.

03 Topics & partitions

A topic is a named channel for a category of events. All order events go into the orders topic; all payments go into the payments topic. Producers choose which topic to write to, and consumers choose which topic to read from.

Each topic is split into partitions. A partition is just an ordered list of messages on one machine. Splitting a topic this way lets Kafka spread the work across many servers - that's the main reason it scales.

Real-life analogy A topic is a bus route number (Route 47 → City Centre). A partition is a row of seats on that bus. More rows means more passengers can travel at once.

04 Partition keys & ordering

How does Kafka decide which partition a message goes into? Through the partition key. Before sending, Kafka takes the key, runs it through a hash function, and uses the result to pick a partition:

// key = userId "U-4491"
hash("U-4491") % 3 = partition 1

The same key always hashes to the same partition. So every event for user U-4491 - order placed, then paid, then shipped - lands in the same partition, in the exact order it happened. That's your ordering guarantee per user.

✓ With a key

Events for one user stay in order. "Placed" is always read before "shipped". Logic stays correct.

No key?

Messages spread round-robin across partitions for even load - fine when order doesn't matter.

✗ The risk

Without a key, a consumer might process "cancelled" before "placed" - broken business logic.

05 Brokers & the commit log

A broker is a single Kafka server. A group of them is a cluster. The broker's job: receive messages, assign each one a number, and append it to a file on disk called the commit log.

The broker has no business logic. It doesn't run fraud checks or send emails. It stores data and serves it - that's the whole job. The narrowness is intentional; it's what makes brokers predictable and fast.

Why write to disk instead of memory?

Sequential disk writes are actually faster than most people expect - appending to the end of a file skips the slow seek time that random I/O requires. And since the data is on disk, it survives broker restarts and can be replayed at any point.

Real-life analogy A broker is the bus itself - or more precisely, the conductor with a permanent logbook. Every passenger gets stamped with a seat number, written in ink, never erased.

06 Offsets - the bookmark

Every message inside a partition gets a sequential number called an offset. It is not a global unique ID for the order - it is simply the message's position within that one partition, like a line number in a notebook.

Partition 1 - topic: orders
offset 0 → U-3312 → order.placed
offset 1 → U-1188 → order.placed
offset 2 → U-4491 → order.placed   ← our order
offset 3(next write...)

Offset 2 in Partition 1 is a completely different message from offset 2 in Partition 0 - the number is local to each partition. Once written, it never changes and is never reused. A consumer tracks its last committed offset, so after a crash or restart it picks up exactly where it left off.

07 Partition vs replica - two different things

Partitioning splits different messages across different machines. Replication makes copies of the same partition for safety. They sound related but solve different problems - and mixing them up is probably the most common Kafka misconception.

Partitioning 3 orders → 3 partitions on 3 machines Order AU-1001 Order BU-4491 Order CU-7734 Partition 0Broker 1 only Partition 1Broker 2 only Partition 2Broker 3 only Each message lives in exactly ONE partition Purpose: parallel reads & horizontal scale Replication Partition 1 copied to 3 brokers for safety Partition 1LEADER · Broker 2 FollowerBroker 1 · same data FollowerBroker 3 · same data Broker 2 dies? a follower is promoted to LEADER no data lost Purpose: fault tolerance & zero data loss
partition (storage lane) replica (safety copy) failure path
Partitioning distributes different data. Replication duplicates the same data.

So when you have a topic with 3 partitions and a replication factor of 3, you end up with 3 lanes of different data, and each lane is copied onto 3 machines. One is for scale, the other is for safety.

08 Leader, follower & ISR replication

Among the copies of each partition, one is the leader and the rest are followers. Producers and consumers only talk to the leader. Followers handle no traffic at all - they silently copy from the leader and stay ready.

The set of replicas fully caught up with the leader is called the ISR (In-Sync Replicas). With acks=all, the broker only confirms the write once the leader and every in-sync follower have stored the message. No silent data loss.

// what happens when fraud writes a result
Fraud Service ──writes──▸ leader (fraud-results)
                           ├─ copies to follower 1
                           └─ copies to follower 2
                           all in sync ──▸ ACK back to Fraud Service

If the leader crashes, Kafka's controller detects it within seconds and promotes the most up-to-date follower. Producers and consumers reconnect automatically. No manual intervention needed.

Real-life analogy The leader is the main warehouse that handles all shipments. The followers are backup warehouses kept perfectly stocked. If the main one burns down, a backup instantly takes over - customers never notice.

09 Consumers & consumer groups

A consumer reads messages from a topic and does something with them - a Fraud Service, an Email Service, an Analytics pipeline. The partition is passive storage; the consumer is what actually processes the data.

Consumers that cooperate are organised into a consumer group. Within one group, each partition is handled by exactly one consumer instance, so the work is split and no message is processed twice by the same group.

Kafka broker - passive storage only no logic, no APIs - just an append-only log on disk Partition 0off:0 off:1 off:2 off:3 Partition 1off:0 off:1 off:2 off:3 Partition 2off:0 off:1 off:2 off:3 both groups pull the same partitions - each keeps its own offset consumer group: fraud-detection-cg fraud-instance-1reads P0 · offset 3 fraud-instance-2reads P1 · offset 2 fraud-instance-3reads P2 · offset 3 consumer group: email-notification-cg - separate offsets email-instance-1reads P0 · offset 1 email-instance-2reads P1 · offset 0 email-instance-3reads P2 · offset 3
storage + fraud group email group
Two layers: passive storage (top) and active consumers (bottom). The email group being behind never slows fraud.

Both groups read the same partitions, but each tracks its own offset independently. The email group sitting at offset 0 has no effect on the fraud group at offset 3 - separate processes, separate machines, no coordination required.

10 The pull model - how consumers "know"

Nobody tells the Fraud Service a new order arrived. It runs a poll loop - an endless loop asking the broker every ~100ms whether there's anything new. When there is, the next poll returns it.

while (true) {
  messages = consumer.poll(100ms)
  for each message:
      runFraudCheck(message)   // do the work
      commitOffset()           // move the bookmark forward
}

Kafka doesn't push messages to consumers - consumers pull. A slow consumer can't be overwhelmed because it reads only when it's ready. The offset gets committed after the work completes, not before. So if the consumer crashes mid-processing, the message gets re-read on restart rather than silently dropped.

11 Fan-out - one event, many readers

Any number of consumer groups can read the same orders topic simultaneously - Fraud, Inventory, Email, Analytics, and whatever else you add next month.

Each consumer writes its own result wherever it needs - fraud writes a score to a fraud-results topic, inventory updates its database, email calls an SMTP server. The original message on the orders topic is never modified or deleted; it just sits in the log until retention expires.

12 The four Kafka APIs

Kafka exposes four client APIs:

One more piece worth naming: the cluster needs a "control office" that tracks which brokers are alive and runs leader elections. Older Kafka used ZooKeeper for this; modern Kafka uses a built-in mechanism called KRaft, removing the external dependency.

Key takeaways

Credits & further reading Based on the official Apache Kafka documentation and GeeksforGeeks' Kafka architecture guide. For the full technical reference:

▸ Apache Kafka Documentation - kafka.apache.org/documentation
▸ GeeksforGeeks - Kafka Architecture - geeksforgeeks.org/apache-kafka/kafka-architecture
Apache Kafka® is a registered trademark of the Apache Software Foundation. This article is an independent educational explainer.