Building Event-Driven Data Pipelines with Apache Kafka
How to design, build, and operate reliable event-driven architectures with Kafka — from schema design to exactly-once processing.
Apache Kafka has become the backbone of modern data architecture. At Vaarak, we've deployed Kafka clusters processing billions of events per day for logistics tracking, financial transactions, IoT telemetry, and real-time analytics. But Kafka's power comes with complexity — getting it right requires careful design decisions upfront.
Topic Design: Get This Right First
Topic design is the most consequential decision in a Kafka deployment. Topics define your data contracts, partitioning strategy, and retention policy. A poorly designed topic structure leads to hot partitions, consumer group rebalancing storms, and ordering guarantees that are impossible to maintain.
- One topic per event type (order.created, order.updated, payment.processed) — not one topic per entity with a type field
- Choose partition keys carefully — use the entity ID (orderId, userId) to guarantee ordering within an entity
- Set partition count based on expected throughput — each partition can handle ~10MB/s; over-partition early since you can't reduce partition count later
- Use Avro or Protobuf schemas registered in Schema Registry — JSON is flexible but schema evolution is a nightmare without a registry
import { Kafka, Partitioners } from 'kafkajs';
import { SchemaRegistry } from '@kafkajs/confluent-schema-registry';
const kafka = new Kafka({ brokers: ['kafka-1:9092', 'kafka-2:9092', 'kafka-3:9092'] });
const registry = new SchemaRegistry({ host: 'http://schema-registry:8081' });
const producer = kafka.producer({ createPartitioner: Partitioners.DefaultPartitioner });
async function publishOrderEvent(event: OrderEvent) {
const schemaId = await registry.getLatestSchemaId('order-events-value');
const encodedValue = await registry.encode(schemaId, event);
await producer.send({
topic: 'order-events',
messages: [{
key: event.orderId, // Partition key = orderId for ordering guarantee
value: encodedValue,
headers: {
'event-type': event.type,
'correlation-id': event.correlationId,
'timestamp': Date.now().toString(),
},
}],
});
}Consumer Patterns: Idempotency and Exactly-Once
Kafka guarantees at-least-once delivery, which means consumers must be idempotent — processing the same message twice should produce the same result. The simplest approach is an idempotency key: store the event ID in your database within the same transaction as your business logic. If the event ID already exists, skip processing.
Exactly-once semantics in Kafka means exactly-once within the Kafka ecosystem (producer → topic → consumer). For end-to-end exactly-once (Kafka → external database), you need idempotent consumers with transactional outbox patterns.
Dead Letter Queues and Error Handling
Every production Kafka consumer needs a dead letter queue (DLQ). When a message fails processing after N retries, it's moved to a DLQ topic for manual investigation. Without a DLQ, a single poison message blocks the entire consumer group — one bad event stops all event processing.
Monitoring and Alerting
The most critical Kafka metric is consumer lag — the difference between the latest produced offset and the latest consumed offset. Rising consumer lag means your consumers can't keep up with producers, and you're falling behind on processing. Alert on sustained lag increases, not absolute values (a brief spike during a deployment is normal).
“Kafka is not a database, a message queue, or a pub/sub system — it's an event log. Design your architecture around this mental model, and the patterns (event sourcing, CQRS, stream processing) follow naturally.”
— Sarah Chen, Vaarak Infrastructure
Sarah Chen
Cloud Infrastructure Architect