Apache Kafka for Architects: From Concepts to Production Patterns

A hands-on guide built around a real microservices system — order processing, notification fan-out, multi-broker clustering, and delivery guarantees.

1. Why Kafka? The Paradigm Shift

Traditional message queues (RabbitMQ, ActiveMQ, SQS) are built around a simple model:

Producer → [Queue] → Consumer → MESSAGE DELETED

Kafka rejects this model entirely. It is a distributed commit log, not a queue:

Producer → [Immutable Log] → Consumer A (offset 5)
→ Consumer B (offset 2)
→ Consumer C (offset 9)

Message lives until retention expires — not when consumed

The consequences of this design

Traditional QueueKafka
Message deleted after consumptionMessage retained (days, weeks, forever)
One consumer per messageUnlimited consumer groups, each reads every message
Scaling = more consumersScaling = more partitions
Replay is impossibleReplay by seeking to any offset
Ordered delivery hardOrdered within a partition by design

When Kafka wins:

  • Fan-out to multiple independent consumers
  • Event sourcing / audit log
  • Stream processing pipelines
  • Decoupling microservices at scale
  • Replayable event history

When Kafka loses:

  • Simple task queues (use SQS/RabbitMQ)
  • Request-reply patterns (use gRPC)
  • Low message volume with complex routing rules

2. Core Concepts Every Architect Must Know

Topic

A named, append-only log. Think of it as a database table that only supports INSERT and SELECT, never DELETE or UPDATE.

notification-topic
─────────────────────────────────────────────────
offset: 0 1 2 3 4
msg: [evt-A] [evt-B] [evt-C] [evt-D] [evt-E]
← new messages append here

Partition

Topics are split into partitions for parallelism. Each partition is an independent ordered log.

notification-topic (2 partitions)

Partition 0: [evt-A] [evt-C] [evt-E] ← orderId hash ends in even
Partition 1: [evt-B] [evt-D] [evt-F] ← orderId hash ends in odd

Key insight: Ordering is guaranteed within a partition, not across partitions.
Key design rule: Use the entity ID (orderId, userId) as the message key → all events for the same entity land on the same partition → ordering preserved.

Consumer Group

A group of consumers that collectively consume a topic. Kafka assigns each partition to exactly one consumer in the group at a time.

notification-group (3 consumers, 2 partitions)

Partition 0 ──► Consumer 1 ✅ ACTIVE
Partition 1 ──► Consumer 2 ✅ ACTIVE
Consumer 3 ⏸ IDLE ← no partition to assign

Critical rule: Adding consumers beyond the partition count = idle consumers.

Offset

A sequential ID for each message within a partition. Each consumer group tracks its own offset independently.

dispatch-topic Partition 0
[msg0] [msg1] [msg2] [msg3] [msg4]
↑ ↑
notification-sms notification-email
offset=2 offset=4

notification-sms has 2 unread messages
notification-email has caught up


3. The Architecture I Built for PoC

Our system has four microservices connected by two Kafka topics:

Why this design?

DecisionReason
Separate notification-topic and dispatch-topicnotification-topic carries business events; dispatch-topic carries delivery instructions. Different concerns, different schemas, different retention needs.
notification-service as intermediaryDecouples order-service from knowing SMS/Email exist. Adding push notifications means zero changes to order-service.
orderId as message keyAll events for an order go to the same partition → processing order preserved
2 partitions on notification-topicLow volume; 3 consumers gives idle demo
5 partitions on dispatch-topicHigher volume expected (SMS + EMAIL = 2x messages); more parallelism

4. Deep Dive: Topics and Partitions

How partitioning works

// order-service: NotificationController.javakafkaTemplate.send(
    "notification-topic",
    order.getId(),      // ← KEY: Kafka hashes this to pick a partition
    notificationEvent
);

Kafka’s default partitioner:

partition = hash(key) % numPartitions

orderId = "abc-123" → hash = 847392 → 847392 % 2 = 0 → Partition 0
orderId = "def-456" → hash = 123847 → 123847 % 2 = 1 → Partition 1

No key provided? Kafka uses round-robin across partitions.

How many partitions?

Rule of thumb:
partitions = max(target throughput / partition throughput)

notification-topic: low volume (one per order) → 2 partitions
dispatch-topic: 2x volume (SMS + EMAIL) → 5 partitions

More partitions = more parallelism BUT
- more open file handles on brokers
- longer leader election on failure
- more consumer threads needed to fully utilize

Replication factor

RF=3 means 3 copies of each partition across 3 brokers

Partition 0:
Broker 1 → LEADER (handles reads and writes)
Broker 2 → FOLLOWER (replicates from leader)
Broker 3 → FOLLOWER (replicates from leader)

If Broker 1 dies:
Kafka elects Broker 2 or 3 as new leader
Zero message loss (data was replicated)
Zero downtime for consumers (within seconds of re-election)


5. Deep Dive: Consumer Groups and Idle Consumers

Partition assignment rules

Rule: Each partition is assigned to exactly ONE consumer per group
Rule: Consumers ≤ Partitions → all consumers are active
Rule: Consumers > Partitions → excess consumers are idle

notification-topic (2 partitions) + notification-group (3 consumers):

Consumer 1 ──► Partition 0 ✅
Consumer 2 ──► Partition 1 ✅
Consumer 3 ──► (nothing) ⏸

dispatch-topic (5 partitions) + notification-sms (6 consumers):

Consumer 1 ──► Partition 0 ✅
Consumer 2 ──► Partition 1 ✅
Consumer 3 ──► Partition 2 ✅
Consumer 4 ──► Partition 3 ✅
Consumer 5 ──► Partition 4 ✅
Consumer 6 ──► (nothing) ⏸

Why have idle consumers?

Standby for instant failover. If Consumer 1 dies:

Before:
Consumer 1 ──► P0 ✅ Consumer 2 ──► P1 ✅ Consumer 3 ──► idle

Consumer 1 crashes → Kafka triggers REBALANCE:
Consumer 2 ──► P1 ✅ Consumer 3 ──► P0 ✅ (Consumer 1 gone)

Without an idle consumer, P0 would have zero consumers until a new one starts. With an idle consumer, failover is instant — the rebalance promotes the standby.

Rebalancing

When group membership changes (consumer joins/leaves/crashes), Kafka reassigns all partitions:

// Our consumers implement ConsumerSeekAware to log assignments@Override
public void onPartitionsAssigned(Map<TopicPartition, Long> assignments,
                                 ConsumerSeekCallback callback) {
    if (assignments.isEmpty()) {
        log.warn("[SMS] ⏸ IDLE | thread={}", Thread.currentThread().getName());
    } else {
        assignments.keySet().forEach(tp ->
            log.info("[SMS] 🎯 ASSIGNED | partition={}", tp.partition()));
    }
}

6. Deep Dive: Delivery Guarantees

The acks setting — producer side

Controls how many brokers must acknowledge before the producer considers the write successful.

acks=0 Fire and forget
Producer ──write──► Broker 1
◄──────── (no wait)
Risk: silent data loss

acks=1 Leader acknowledgement
Producer ──write──► Broker 1 (leader) ──► ACK
└── replicates async ──► Broker 2, 3
Risk: data loss if leader crashes before replication completes

acks=all All in-sync replicas
Producer ──write──► Broker 1 ──replicate──► Broker 2 ──► ACK
└──replicate──► Broker 3 ──► ACK
◄──────────────────────────────────────────────────
Risk: none (survives any single broker failure)

min.insync.replicas — broker/topic side

The minimum number of ISRs a partition must have before accepting writes with acks=all.

# docker-compose-cluster.yml
KAFKA_MIN_INSYNC_REPLICAS: 2

dispatch-topic RF=3:
ISRs = [Broker1, Broker2, Broker3] # 3 ISRs
min.insync.replicas = 2
acks=all → needs 3 ACKs → 3 ≥ 2 ✅ → WRITE ACCEPTED

Broker 3 goes down:
ISRs = [Broker1, Broker2] # 2 ISRs
min.insync.replicas = 2
acks=all → needs 2 ACKs → 2 ≥ 2 ✅ → WRITE ACCEPTED (cluster still healthy)

Broker 2 also goes down:
ISRs = [Broker1] # 1 ISR
min.insync.replicas = 2
acks=all → 1 < 2 ❌ → NOT_ENOUGH_REPLICAS (cluster protects data integrity)

ErrorHandlingDeserializer — consumer resilience

When a consumer encounters a message it can’t deserialize (wrong format, wrong class name), the default behavior crashes the container. Use ErrorHandlingDeserializer to skip bad records:

// notification-service: KafkaConsumerConfig.java
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, ErrorHandlingDeserializer.class);
props.put(ErrorHandlingDeserializer.VALUE_DESERIALIZER_CLASS, JsonDeserializer.class);
props.put(JsonDeserializer.VALUE_DEFAULT_TYPE,
          "in.asvignesh.notification.model.NotificationEvent");
props.put(JsonDeserializer.USE_TYPE_INFO_HEADERS, false);  // ignore __TypeId__ header

Why USE_TYPE_INFO_HEADERS=false?

By default, Spring’s JsonSerializer adds a __TypeId__ header with the fully-qualified class name:

Producer in order-service publishes:
  Header: __TypeId__ = in.asvignesh.order.model.Order

Consumer in notification-service sees:
  "I need to load class in.asvignesh.order.model.Order"
  ClassNotFoundException → SerializationException → crash

Disabling type headers and specifying VALUE_DEFAULT_TYPE means the consumer always deserializes to NotificationEvent regardless of what the producer says.


7. Fan-out Patterns

Fan-out is the ability for one produced message to trigger multiple independent downstream processes.

Pattern A: Multiple Consumer Groups (simplest)

What you have with dispatch-topic.

dispatch-topic

├──► notification-sms (own offset, own scaling) → 📱 SMS
└──► notification-email (own offset, own scaling) → 📧 Email

Adding push notifications = add one consumer group, zero other changes:
└──► notification-push (new group reads from offset 0) → 📲 Push

Best for: Independent consumers with same message format.

Pattern B: Intermediate Processor Fan-out (what notification-service does)

notification-topic [NotificationEvent]


notification-service
│ Creates channel-specific DispatchEvents
├──► dispatch-topic [DispatchEvent{channel=SMS}]
└──► dispatch-topic [DispatchEvent{channel=EMAIL}]

Best for: When downstream consumers need enriched, transformed, or channel-specific messages.

Pattern C: Topic-per-Consumer

order-service
├──► sms-topic → notification-sms
├──► email-topic → notification-email
└──► push-topic → notification-push

Best for: Different schemas per consumer, different retention/partition requirements.
Avoid when: Producer shouldn’t know about consumers (tight coupling).

Pattern D: Event Router

events-topic [all events]


[Router Service]
├── ORDER_PLACED ──► order-placed-topic
├── ORDER_SHIPPED ──► order-shipped-topic
└── CANCELLED ──► cancellation-topic

Best for: Content-based routing to specialized consumers.
Avoid when: Router becomes a bottleneck or single point of failure.

Pattern E: Aggregator (reverse fan-out)

The opposite of fan-out — merge multiple streams into one.

sms-results-topic ─┐
email-results-topic ─┼──► [Aggregator] ──► notification-audit-topic
push-results-topic ─┘

Best for: Audit logs, metrics aggregation, saga orchestration.

Choosing a fan-out pattern

Same message format for all consumers?
YES → Pattern A (multiple consumer groups)
NO → Pattern B or C

Producer should be decoupled from consumers?
YES → Pattern A or B
NO → Pattern C

Need content-based routing?
YES → Pattern D
NO → Pattern A

Need to merge multiple streams?
YES → Pattern E


8. Multi-Broker Clustering

KRaft mode (no Zookeeper)

Our cluster uses Kafka’s built-in KRaft consensus (available since Kafka 3.3, stable since 3.6):

# docker-compose-cluster.yml
KAFKA_PROCESS_ROLES: broker,controller   # each node is both
KAFKA_CONTROLLER_QUORUM_VOTERS: 1@kafka-1:9093,2@kafka-2:9093,3@kafka-3:9093
kafka-1 ◄──► kafka-2 ◄──► kafka-3
  (KRaft quorum — majority vote for leader election)
  
  3 nodes → majority = 2 → can survive 1 node failure

The listener misconfiguration that caused coordinator errors

# BROKEN — all three brokers advertise localhost
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092

# Inside Docker: localhost = the container itself
# When Broker 1 tells clients "coordinator is at localhost:9094"
# Broker 1 tries to connect to its own localhost:9094 → fails
# Result: COORDINATOR_NOT_AVAILABLE
# FIXED — separate internal and external listenersKAFKA_LISTENERS: EXTERNAL://0.0.0.0:9092,INTERNAL://0.0.0.0:29092,CONTROLLER://0.0.0.0:9093
KAFKA_ADVERTISED_LISTENERS: EXTERNAL://localhost:9092,INTERNAL://kafka-1:29092
KAFKA_INTER_BROKER_LISTENER_NAME: INTERNAL

# Brokers talk to each other via kafka-1:29092 (Docker DNS resolves container names)
# Spring Boot apps connect via localhost:9092 (host port mapping)

9. Offset Management and Message Retention

Kafka is a log, not a queue

After consumer reads a message → message STAYS in the log
Consumer's offset pointer moves forward
Message deleted only when retention expires

Offset commit strategies

// Auto-commit (default, risky):
spring.kafka.consumer.enable-auto-commit=true
// Problem: offset committed before processing completes
// If crash between commit and processing → message lost

// Manual commit (safer):
factory.getContainerProperties().setAckMode(AckMode.MANUAL);
// Then in listener:
acknowledgment.acknowledge();
// Only commit after successful processing

Retention policies

# Time-based (default: 7 days)
kafka-configs --bootstrap-server localhost:9092 \
  --alter --entity-type topics --entity-name dispatch-topic \
  --add-config retention.ms=86400000    # 1 day

# Size-based
  --add-config retention.bytes=1073741824  # 1 GB per partition

# Compact (keep only latest value per key — useful for state)
  --add-config cleanup.policy=compact

Replay — the superpower

# Reset a consumer group to the beginning of a topic
kafka-consumer-groups --bootstrap-server localhost:9092 \
  --group notification-sms \
  --topic dispatch-topic \
  --reset-offsets --to-earliest --execute

# Reset to a specific timestamp (e.g., replay last 2 hours)
  --reset-offsets --to-datetime 2024-05-21T12:00:00.000 --execute

Use cases for replay:

  • Bug fix deployed → replay affected messages
  • New downstream service → read full history
  • Analytics job → reprocess all historical events

10. Architect-Level Decision Guide

Should you use Kafka for this?

Is throughput > 10,000 msg/sec? YES → Kafka
Do you need multiple independent consumers? YES → Kafka
Do you need message replay? YES → Kafka
Do you need event sourcing? YES → Kafka

Is it a simple task queue? NO → SQS / RabbitMQ
Do you need request-reply? NO → gRPC / REST
Is message volume very low? NO → Overkill, use simpler tools

Partition count decisions

Start with: partitions = number of consumers you expect at peak
Grow rule: you can always increase partitions, never decrease
(decreasing breaks key-based routing)

notification-topic: 2 (intake is low, 3 consumers demo idle)
dispatch-topic: 5 (higher volume, demonstrate scaling)

Replication and durability decisions

Development / demo: RF=1, acks=1 (speed over safety)
Staging: RF=2, acks=1 (some redundancy)
Production: RF=3, acks=all (full durability)
min.insync.replicas=2

Financial / critical: RF=3, acks=all (+ enable idempotent producer)
min.insync.replicas=2
enable.idempotence=true

Consumer group sizing

Peak throughput / partition throughput = required partitions = max useful consumers

notification-sms: 5 partitions → max 5 active consumers
Add 1 idle for instant failover → deploy 6

notification-email: 5 partitions → max 5 active consumers
Add 3 idle (more standby headroom) → deploy 8

Architecture Summary


Built with Spring Boot 3.3, Spring Kafka 3.2, Apache Kafka 3.7, KRaft mode

Similar Posts

Leave a Reply