Apache Kafka for Architects: From Concepts to Production Patterns -

A hands-on guide built around a real microservices system — order processing, notification fan-out, multi-broker clustering, and delivery guarantees.

1. Why Kafka? The Paradigm Shift

Traditional message queues (RabbitMQ, ActiveMQ, SQS) are built around a simple model:

Producer → [Queue] → Consumer → MESSAGE DELETED

Kafka rejects this model entirely. It is a distributed commit log, not a queue:

Producer → [Immutable Log] → Consumer A (offset 5) → Consumer B (offset 2) → Consumer C (offset 9) Message lives until retention expires — not when consumed

The consequences of this design

Traditional Queue	Kafka
Message deleted after consumption	Message retained (days, weeks, forever)
One consumer per message	Unlimited consumer groups, each reads every message
Scaling = more consumers	Scaling = more partitions
Replay is impossible	Replay by seeking to any offset
Ordered delivery hard	Ordered within a partition by design

When Kafka wins:

Fan-out to multiple independent consumers
Event sourcing / audit log
Stream processing pipelines
Decoupling microservices at scale
Replayable event history

When Kafka loses:

Simple task queues (use SQS/RabbitMQ)
Request-reply patterns (use gRPC)
Low message volume with complex routing rules

2. Core Concepts Every Architect Must Know

Topic

A named, append-only log. Think of it as a database table that only supports INSERT and SELECT, never DELETE or UPDATE.

notification-topic ───────────────────────────────────────────────── offset: 0 1 2 3 4 msg: [evt-A] [evt-B] [evt-C] [evt-D] [evt-E] ← new messages append here

Partition

Topics are split into partitions for parallelism. Each partition is an independent ordered log.

notification-topic (2 partitions) Partition 0: [evt-A] [evt-C] [evt-E] ← orderId hash ends in even Partition 1: [evt-B] [evt-D] [evt-F] ← orderId hash ends in odd

Key insight: Ordering is guaranteed within a partition, not across partitions.
Key design rule: Use the entity ID (orderId, userId) as the message key → all events for the same entity land on the same partition → ordering preserved.

Consumer Group

A group of consumers that collectively consume a topic. Kafka assigns each partition to exactly one consumer in the group at a time.

notification-group (3 consumers, 2 partitions) Partition 0 ──► Consumer 1 ✅ ACTIVE Partition 1 ──► Consumer 2 ✅ ACTIVE Consumer 3 ⏸ IDLE ← no partition to assign

Critical rule: Adding consumers beyond the partition count = idle consumers.

Offset

A sequential ID for each message within a partition. Each consumer group tracks its own offset independently.

dispatch-topic Partition 0 [msg0] [msg1] [msg2] [msg3] [msg4] ↑ ↑ notification-sms notification-email offset=2 offset=4 notification-sms has 2 unread messages notification-email has caught up

3. The Architecture I Built for PoC

Our system has four microservices connected by two Kafka topics:

Why this design?

Decision	Reason
Separate `notification-topic` and `dispatch-topic`	`notification-topic` carries business events; `dispatch-topic` carries delivery instructions. Different concerns, different schemas, different retention needs.
notification-service as intermediary	Decouples order-service from knowing SMS/Email exist. Adding push notifications means zero changes to order-service.
orderId as message key	All events for an order go to the same partition → processing order preserved
2 partitions on notification-topic	Low volume; 3 consumers gives idle demo
5 partitions on dispatch-topic	Higher volume expected (SMS + EMAIL = 2x messages); more parallelism

4. Deep Dive: Topics and Partitions

How partitioning works

// order-service: NotificationController.javakafkaTemplate.send(
    "notification-topic",
    order.getId(),      // ← KEY: Kafka hashes this to pick a partition
    notificationEvent
);

Kafka’s default partitioner:

partition = hash(key) % numPartitions

orderId = "abc-123" → hash = 847392 → 847392 % 2 = 0 → Partition 0
orderId = "def-456" → hash = 123847 → 123847 % 2 = 1 → Partition 1

No key provided? Kafka uses round-robin across partitions.

How many partitions?

Rule of thumb: partitions = max(target throughput / partition throughput) notification-topic: low volume (one per order) → 2 partitions dispatch-topic: 2x volume (SMS + EMAIL) → 5 partitions More partitions = more parallelism BUT - more open file handles on brokers - longer leader election on failure - more consumer threads needed to fully utilize

Replication factor

RF=3 means 3 copies of each partition across 3 brokers Partition 0: Broker 1 → LEADER (handles reads and writes) Broker 2 → FOLLOWER (replicates from leader) Broker 3 → FOLLOWER (replicates from leader) If Broker 1 dies: Kafka elects Broker 2 or 3 as new leader Zero message loss (data was replicated) Zero downtime for consumers (within seconds of re-election)

5. Deep Dive: Consumer Groups and Idle Consumers

Partition assignment rules

Rule: Each partition is assigned to exactly ONE consumer per group Rule: Consumers ≤ Partitions → all consumers are active Rule: Consumers > Partitions → excess consumers are idle notification-topic (2 partitions) + notification-group (3 consumers): Consumer 1 ──► Partition 0 ✅ Consumer 2 ──► Partition 1 ✅ Consumer 3 ──► (nothing) ⏸ dispatch-topic (5 partitions) + notification-sms (6 consumers): Consumer 1 ──► Partition 0 ✅ Consumer 2 ──► Partition 1 ✅ Consumer 3 ──► Partition 2 ✅ Consumer 4 ──► Partition 3 ✅ Consumer 5 ──► Partition 4 ✅ Consumer 6 ──► (nothing) ⏸

Why have idle consumers?

Standby for instant failover. If Consumer 1 dies:

Before: Consumer 1 ──► P0 ✅ Consumer 2 ──► P1 ✅ Consumer 3 ──► idle Consumer 1 crashes → Kafka triggers REBALANCE: Consumer 2 ──► P1 ✅ Consumer 3 ──► P0 ✅ (Consumer 1 gone)

Without an idle consumer, P0 would have zero consumers until a new one starts. With an idle consumer, failover is instant — the rebalance promotes the standby.

Rebalancing

When group membership changes (consumer joins/leaves/crashes), Kafka reassigns all partitions:

// Our consumers implement ConsumerSeekAware to log assignments@Override
public void onPartitionsAssigned(Map<TopicPartition, Long> assignments,
                                 ConsumerSeekCallback callback) {
    if (assignments.isEmpty()) {
        log.warn("[SMS] ⏸ IDLE | thread={}", Thread.currentThread().getName());
    } else {
        assignments.keySet().forEach(tp ->
            log.info("[SMS] 🎯 ASSIGNED | partition={}", tp.partition()));
    }
}

6. Deep Dive: Delivery Guarantees

The `acks` setting — producer side

Controls how many brokers must acknowledge before the producer considers the write successful.

acks=0 Fire and forget Producer ──write──► Broker 1 ◄──────── (no wait) Risk: silent data loss acks=1 Leader acknowledgement Producer ──write──► Broker 1 (leader) ──► ACK └── replicates async ──► Broker 2, 3 Risk: data loss if leader crashes before replication completes acks=all All in-sync replicas Producer ──write──► Broker 1 ──replicate──► Broker 2 ──► ACK └──replicate──► Broker 3 ──► ACK ◄────────────────────────────────────────────────── Risk: none (survives any single broker failure)

`min.insync.replicas` — broker/topic side

The minimum number of ISRs a partition must have before accepting writes with acks=all.

# docker-compose-cluster.yml
KAFKA_MIN_INSYNC_REPLICAS: 2

dispatch-topic RF=3: ISRs = [Broker1, Broker2, Broker3] # 3 ISRs min.insync.replicas = 2 acks=all → needs 3 ACKs → 3 ≥ 2 ✅ → WRITE ACCEPTED Broker 3 goes down: ISRs = [Broker1, Broker2] # 2 ISRs min.insync.replicas = 2 acks=all → needs 2 ACKs → 2 ≥ 2 ✅ → WRITE ACCEPTED (cluster still healthy) Broker 2 also goes down: ISRs = [Broker1] # 1 ISR min.insync.replicas = 2 acks=all → 1 < 2 ❌ → NOT_ENOUGH_REPLICAS (cluster protects data integrity)

`ErrorHandlingDeserializer` — consumer resilience

When a consumer encounters a message it can’t deserialize (wrong format, wrong class name), the default behavior crashes the container. Use ErrorHandlingDeserializer to skip bad records:

// notification-service: KafkaConsumerConfig.java
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, ErrorHandlingDeserializer.class);
props.put(ErrorHandlingDeserializer.VALUE_DESERIALIZER_CLASS, JsonDeserializer.class);
props.put(JsonDeserializer.VALUE_DEFAULT_TYPE,
          "in.asvignesh.notification.model.NotificationEvent");
props.put(JsonDeserializer.USE_TYPE_INFO_HEADERS, false);  // ignore __TypeId__ header

Why USE_TYPE_INFO_HEADERS=false?

By default, Spring’s JsonSerializer adds a __TypeId__ header with the fully-qualified class name:

Producer in order-service publishes:
  Header: __TypeId__ = in.asvignesh.order.model.Order

Consumer in notification-service sees:
  "I need to load class in.asvignesh.order.model.Order"
  ClassNotFoundException → SerializationException → crash

Disabling type headers and specifying VALUE_DEFAULT_TYPE means the consumer always deserializes to NotificationEvent regardless of what the producer says.

7. Fan-out Patterns

Fan-out is the ability for one produced message to trigger multiple independent downstream processes.

Pattern A: Multiple Consumer Groups (simplest)

What you have with dispatch-topic.

dispatch-topic │ ├──► notification-sms (own offset, own scaling) → 📱 SMS └──► notification-email (own offset, own scaling) → 📧 Email Adding push notifications = add one consumer group, zero other changes: └──► notification-push (new group reads from offset 0) → 📲 Push

Best for: Independent consumers with same message format.

Pattern B: Intermediate Processor Fan-out (what notification-service does)

notification-topic [NotificationEvent] │ ▼ notification-service │ Creates channel-specific DispatchEvents ├──► dispatch-topic [DispatchEvent{channel=SMS}] └──► dispatch-topic [DispatchEvent{channel=EMAIL}]

Best for: When downstream consumers need enriched, transformed, or channel-specific messages.

Pattern C: Topic-per-Consumer

order-service ├──► sms-topic → notification-sms ├──► email-topic → notification-email └──► push-topic → notification-push

Best for: Different schemas per consumer, different retention/partition requirements.
Avoid when: Producer shouldn’t know about consumers (tight coupling).

Pattern D: Event Router

events-topic [all events] │ ▼ [Router Service] ├── ORDER_PLACED ──► order-placed-topic ├── ORDER_SHIPPED ──► order-shipped-topic └── CANCELLED ──► cancellation-topic

Best for: Content-based routing to specialized consumers.
Avoid when: Router becomes a bottleneck or single point of failure.

Pattern E: Aggregator (reverse fan-out)

The opposite of fan-out — merge multiple streams into one.

sms-results-topic ─┐ email-results-topic ─┼──► [Aggregator] ──► notification-audit-topic push-results-topic ─┘

Best for: Audit logs, metrics aggregation, saga orchestration.

Choosing a fan-out pattern

Same message format for all consumers? YES → Pattern A (multiple consumer groups) NO → Pattern B or C Producer should be decoupled from consumers? YES → Pattern A or B NO → Pattern C Need content-based routing? YES → Pattern D NO → Pattern A Need to merge multiple streams? YES → Pattern E

8. Multi-Broker Clustering

KRaft mode (no Zookeeper)

Our cluster uses Kafka’s built-in KRaft consensus (available since Kafka 3.3, stable since 3.6):

# docker-compose-cluster.yml
KAFKA_PROCESS_ROLES: broker,controller   # each node is both
KAFKA_CONTROLLER_QUORUM_VOTERS: 1@kafka-1:9093,2@kafka-2:9093,3@kafka-3:9093

kafka-1 ◄──► kafka-2 ◄──► kafka-3
  (KRaft quorum — majority vote for leader election)
  
  3 nodes → majority = 2 → can survive 1 node failure

The listener misconfiguration that caused coordinator errors

# BROKEN — all three brokers advertise localhost
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092

# Inside Docker: localhost = the container itself
# When Broker 1 tells clients "coordinator is at localhost:9094"
# Broker 1 tries to connect to its own localhost:9094 → fails
# Result: COORDINATOR_NOT_AVAILABLE

# FIXED — separate internal and external listenersKAFKA_LISTENERS: EXTERNAL://0.0.0.0:9092,INTERNAL://0.0.0.0:29092,CONTROLLER://0.0.0.0:9093
KAFKA_ADVERTISED_LISTENERS: EXTERNAL://localhost:9092,INTERNAL://kafka-1:29092
KAFKA_INTER_BROKER_LISTENER_NAME: INTERNAL

# Brokers talk to each other via kafka-1:29092 (Docker DNS resolves container names)
# Spring Boot apps connect via localhost:9092 (host port mapping)

9. Offset Management and Message Retention

Kafka is a log, not a queue

After consumer reads a message → message STAYS in the log Consumer's offset pointer moves forward Message deleted only when retention expires

Offset commit strategies

// Auto-commit (default, risky):
spring.kafka.consumer.enable-auto-commit=true
// Problem: offset committed before processing completes
// If crash between commit and processing → message lost

// Manual commit (safer):
factory.getContainerProperties().setAckMode(AckMode.MANUAL);
// Then in listener:
acknowledgment.acknowledge();
// Only commit after successful processing

Retention policies

# Time-based (default: 7 days)
kafka-configs --bootstrap-server localhost:9092 \
  --alter --entity-type topics --entity-name dispatch-topic \
  --add-config retention.ms=86400000    # 1 day

# Size-based
  --add-config retention.bytes=1073741824  # 1 GB per partition

# Compact (keep only latest value per key — useful for state)
  --add-config cleanup.policy=compact

Replay — the superpower

# Reset a consumer group to the beginning of a topic
kafka-consumer-groups --bootstrap-server localhost:9092 \
  --group notification-sms \
  --topic dispatch-topic \
  --reset-offsets --to-earliest --execute

# Reset to a specific timestamp (e.g., replay last 2 hours)
  --reset-offsets --to-datetime 2024-05-21T12:00:00.000 --execute

Use cases for replay:

Bug fix deployed → replay affected messages
New downstream service → read full history
Analytics job → reprocess all historical events

10. Architect-Level Decision Guide

Should you use Kafka for this?

Is throughput > 10,000 msg/sec? YES → Kafka Do you need multiple independent consumers? YES → Kafka Do you need message replay? YES → Kafka Do you need event sourcing? YES → Kafka Is it a simple task queue? NO → SQS / RabbitMQ Do you need request-reply? NO → gRPC / REST Is message volume very low? NO → Overkill, use simpler tools

Partition count decisions

Start with: partitions = number of consumers you expect at peak Grow rule: you can always increase partitions, never decrease (decreasing breaks key-based routing) notification-topic: 2 (intake is low, 3 consumers demo idle) dispatch-topic: 5 (higher volume, demonstrate scaling)

Replication and durability decisions

Development / demo: RF=1, acks=1 (speed over safety) Staging: RF=2, acks=1 (some redundancy) Production: RF=3, acks=all (full durability) min.insync.replicas=2 Financial / critical: RF=3, acks=all (+ enable idempotent producer) min.insync.replicas=2 enable.idempotence=true

Consumer group sizing

Peak throughput / partition throughput = required partitions = max useful consumers

notification-sms:  5 partitions → max 5 active consumers
                   Add 1 idle for instant failover → deploy 6

notification-email: 5 partitions → max 5 active consumers
                    Add 3 idle (more standby headroom) → deploy 8

Architecture Summary

Built with Spring Boot 3.3, Spring Kafka 3.2, Apache Kafka 3.7, KRaft mode