7. Message Queues & Asynchronous Processing
Message queues enable asynchronous communication between services by decoupling producers (senders) from consumers (receivers). They are essential for building scalable, resilient, and loosely coupled distributed systems.
Why Message Queues?
Synchronous Communication Problems
Client → Service A → Service B → Service C → Response
↑
If slow or down,
entire chain blocks
Asynchronous Communication with Queues
Client → Service A → [Message Queue] → Service B (processes later)
↓
Immediate ACK
Benefits:
- Decoupling: Producer doesn't need to know about the consumer.
- Asynchronous processing: Producer doesn't wait for the consumer.
- Load leveling: Queue absorbs traffic spikes; consumers process at their own pace.
- Reliability: Messages persist even if consumers are temporarily down.
- Scalability: Add more consumers independently to handle load.
- Retry & fault tolerance: Failed messages can be retried.
Core Concepts
Message
The unit of data sent between producer and consumer.
{
"id": "msg-123",
"type": "order.created",
"timestamp": "2025-01-15T10:30:00Z",
"payload": {
"orderId": "order-456",
"userId": "user-789",
"amount": 99.99
}
}
Producer (Publisher)
The service that sends messages to the queue.
Consumer (Subscriber)
The service that receives and processes messages from the queue.
Queue / Topic
The channel through which messages flow.
Broker
The server that manages queues, routing, persistence, and delivery.
Messaging Models
1. Point-to-Point (Queue)
Each message is consumed by exactly one consumer. Multiple consumers can listen, but only one gets each message.
Producer → [Queue] → Consumer A (gets message 1)
→ Consumer B (gets message 2)
→ Consumer C (gets message 3)
Use cases: Task distribution, job processing, order processing.
2. Publish-Subscribe (Pub/Sub)
Each message is delivered to all subscribers of a topic.
Producer → [Topic] → Subscriber A (gets message)
→ Subscriber B (gets message)
→ Subscriber C (gets message)
Use cases: Event broadcasting, notifications, real-time updates.
3. Consumer Groups
A hybrid: a topic has multiple consumer groups. Each group acts as a single logical subscriber — within a group, each message goes to only one consumer.
Producer → [Topic] → Consumer Group 1: [C1a, C1b] → Each message to one consumer
→ Consumer Group 2: [C2a, C2b] → Each message to one consumer
Use cases: Multiple services processing the same stream of events independently.
Delivery Guarantees
| Guarantee | Description | Risk |
|---|---|---|
| At-most-once | Message delivered zero or one times. No retries. | Message loss possible |
| At-least-once | Message delivered one or more times. Retries on failure. | Duplicate processing possible |
| Exactly-once | Message delivered exactly one time. | Hard to achieve; requires idempotency |
Achieving Exactly-Once Semantics
True exactly-once delivery is extremely difficult. In practice, it's achieved via:
-
Idempotent consumers: Processing the same message twice produces the same result.
// Bad: total += amount (not idempotent) // Good: SET order:123:status = "paid" (idempotent) -
Deduplication: Track processed message IDs.
if message_id in processed_set: skip() else: process(message) processed_set.add(message_id) -
Transactional outbox: Write message and business data in the same DB transaction.
Message Ordering
| Ordering | Description | Systems |
|---|---|---|
| No ordering | Messages may arrive in any order | Basic SQS |
| Partition ordering | Ordered within a partition/shard | Kafka (per partition), Kinesis |
| Total ordering | All messages globally ordered | Single partition, SQS FIFO |
Kafka's Ordering Model
Topic: "orders" with 3 partitions
Partition 0: [msg1, msg4, msg7] → Ordered within partition
Partition 1: [msg2, msg5, msg8] → Ordered within partition
Partition 2: [msg3, msg6, msg9] → Ordered within partition
Partition key: order_id → Same order always goes to same partition → Ordered per order
Popular Message Queue Technologies
Apache Kafka
A distributed event streaming platform designed for high-throughput, durable, ordered message processing.
Architecture:
Producers → [Kafka Broker Cluster] → Consumers
│
┌──────────┼──────────┐
│ │ │
[Broker 1] [Broker 2] [Broker 3]
│ │ │
Partitions Partitions Partitions
Key concepts:
- Topics: Logical categories of messages.
- Partitions: Topics are split into ordered, immutable logs.
- Offsets: Each message has a sequential ID within its partition.
- Consumer groups: Parallel consumption with load balancing.
- Replication: Each partition has configurable replicas across brokers.
- Retention: Messages persist for a configurable duration (not deleted on consumption).
Characteristics:
| Feature | Value |
|---------|-------|
| Throughput | Millions of messages/sec |
| Latency | Single-digit ms |
| Durability | Disk-based, replicated |
| Ordering | Per partition |
| Retention | Time-based or size-based (configurable) |
| Protocol | Custom TCP protocol |
Best for: Event streaming, log aggregation, real-time analytics, change data capture (CDC).
RabbitMQ
A traditional message broker implementing AMQP (Advanced Message Queuing Protocol).
Architecture:
Producer → [Exchange] → Binding → [Queue] → Consumer
Exchange types:
| Type | Routing |
|------|---------|
| Direct | Routes to queues with matching routing key |
| Fanout | Routes to all bound queues (broadcast) |
| Topic | Routes based on pattern matching (e.g., order.*) |
| Headers | Routes based on message header attributes |
Producer → [Topic Exchange]
├── order.created → Queue A (Order Service)
├── order.* → Queue B (Analytics Service)
└── *.created → Queue C (Notification Service)
Characteristics:
| Feature | Value |
|---------|-------|
| Throughput | Tens of thousands/sec |
| Latency | Sub-ms |
| Durability | Optional (persistent or transient) |
| Ordering | Per queue (FIFO) |
| Protocol | AMQP, STOMP, MQTT |
| Acknowledgment | Per-message ACK/NACK |
Best for: Complex routing, task queues, RPC, traditional enterprise messaging.
Amazon SQS (Simple Queue Service)
A fully managed message queue service from AWS.
Types:
| Type | Throughput | Ordering | Deduplication |
|------|-----------|----------|---------------|
| Standard | Unlimited | Best-effort | At-least-once |
| FIFO | 300 msg/sec (3000 with batching) | Strict FIFO | Exactly-once |
Key features:
- Fully managed (no servers to maintain).
- Automatic scaling.
- Dead-letter queues for failed messages.
- Long polling to reduce empty responses.
- Message visibility timeout (prevents duplicate processing).
Best for: AWS-native applications, simple decoupling, serverless architectures.
Amazon SNS (Simple Notification Service)
A fully managed pub/sub service.
Publisher → [SNS Topic] → SQS Queue
→ Lambda Function
→ HTTP Endpoint
→ Email
→ SMS
Often used with SQS in a fan-out pattern:
Event → [SNS Topic] → [SQS Queue 1] → Order Service
→ [SQS Queue 2] → Notification Service
→ [SQS Queue 3] → Analytics Service
Comparison
| Feature | Kafka | RabbitMQ | SQS |
|---|---|---|---|
| Model | Distributed log | Message broker | Managed queue |
| Throughput | Very high (millions/sec) | Moderate (tens of thousands) | Scales automatically |
| Ordering | Per partition | Per queue | FIFO or best-effort |
| Retention | Configurable (days/weeks) | Until consumed | 14 days max |
| Replay | Yes (re-read from any offset) | No (consumed = gone) | No |
| Routing | Topic + partition key | Exchanges + routing keys | Simple queue |
| Managed | Self-hosted or Confluent Cloud | Self-hosted or CloudAMQP | Fully managed (AWS) |
| Best for | Event streaming, analytics | Complex routing, RPC | Simple decoupling |
Patterns
Dead Letter Queue (DLQ)
Messages that fail processing after
Main Queue → Consumer → Fails → Retry 1 → Fails → Retry 2 → Fails → DLQ
Outbox Pattern
Ensures messages are sent reliably alongside database transactions.
1. BEGIN transaction
2. INSERT INTO orders (...)
3. INSERT INTO outbox (message_payload, status='pending')
4. COMMIT transaction
Background process:
5. SELECT * FROM outbox WHERE status='pending'
6. Publish to message queue
7. UPDATE outbox SET status='sent'
Saga (Choreography)
Services communicate through events in a message queue to execute a distributed transaction.
Order Created → [Queue] → Payment Service → Payment Completed → [Queue] → Inventory Service → ...
Competing Consumers
Multiple consumers listen on the same queue for load balancing.
[Queue] → Consumer 1 (gets msg 1, 4, 7)
→ Consumer 2 (gets msg 2, 5, 8)
→ Consumer 3 (gets msg 3, 6, 9)
Priority Queue
Messages are processed based on priority, not just arrival order.
High priority: [P1 Queue] → Consumer (processes first)
Normal priority: [P2 Queue] → Consumer (processes second)
Low priority: [P3 Queue] → Consumer (processes last)
Backpressure
When consumers can't keep up with producers:
| Strategy | Description |
|---|---|
| Drop messages | Oldest or lowest-priority messages are discarded |
| Block producer | Producer blocks until queue has space |
| Buffer to disk | Overflow messages go to disk (slower but durable) |
| Scale consumers | Auto-add more consumer instances |
| Rate limit producers | Slow down incoming message rate |
Key Metrics to Monitor
| Metric | What It Tells You |
|---|---|
| Queue depth | How many messages are waiting — growing = consumers too slow |
| Consumer lag | How far behind consumers are from latest messages |
| Processing rate | Messages consumed per second |
| Error rate | Failed message processing attempts |
| DLQ depth | Number of messages that couldn't be processed |
| Message age | How long messages wait before being consumed |
Summary
| Concept | Key Point |
|---|---|
| Purpose | Decouple services, absorb spikes, enable async processing |
| Models | Point-to-point (one consumer) vs Pub/Sub (all subscribers) |
| Delivery | At-least-once is the practical default; use idempotency for safety |
| Kafka | High-throughput event streaming with replay capability |
| RabbitMQ | Flexible routing with exchanges for complex patterns |
| SQS | Fully managed, simple, integrates with AWS ecosystem |
| Patterns | DLQ, Outbox, Saga, Competing Consumers |
Rule of thumb: Use message queues whenever services don't need an immediate response, when you need to absorb traffic spikes, or when you want to decouple services for independent scaling and deployment.