07. Message Queues & Asynchronous Processing

title: 07. Message Queues & Asynchronous Processing

Message queues enable asynchronous communication between services by decoupling producers (senders) from consumers (receivers). They are essential for building scalable, resilient, and loosely coupled distributed systems.

Why Message Queues?

Synchronous Communication Problems

Client → Service A → Service B → Service C → Response
                                      ↑
                              If slow or down,
                              entire chain blocks

Asynchronous Communication with Queues

Client → Service A → [Message Queue] → Service B (processes later)
              ↓
        Immediate ACK

Benefits:

Decoupling: Producer doesn't need to know about the consumer.
Asynchronous processing: Producer doesn't wait for the consumer.
Load leveling: Queue absorbs traffic spikes; consumers process at their own pace.
Reliability: Messages persist even if consumers are temporarily down.
Scalability: Add more consumers independently to handle load.
Retry & fault tolerance: Failed messages can be retried.

Core Concepts

Message

The unit of data sent between producer and consumer.

{
  "id": "msg-123",
  "type": "order.created",
  "timestamp": "2025-01-15T10:30:00Z",
  "payload": {
    "orderId": "order-456",
    "userId": "user-789",
    "amount": 99.99
  }
}

Producer (Publisher)

The service that sends messages to the queue.

Consumer (Subscriber)

The service that receives and processes messages from the queue.

Queue / Topic

The channel through which messages flow.

Broker

The server that manages queues, routing, persistence, and delivery.

Messaging Models

1. Point-to-Point (Queue)

Each message is consumed by exactly one consumer. Multiple consumers can listen, but only one gets each message.

Producer → [Queue] → Consumer A  (gets message 1)
                  → Consumer B  (gets message 2)
                  → Consumer C  (gets message 3)

Use cases: Task distribution, job processing, order processing.

2. Publish-Subscribe (Pub/Sub)

Each message is delivered to all subscribers of a topic.

Producer → [Topic] → Subscriber A  (gets message)
                  → Subscriber B  (gets message)
                  → Subscriber C  (gets message)

Use cases: Event broadcasting, notifications, real-time updates.

3. Consumer Groups

A hybrid: a topic has multiple consumer groups. Each group acts as a single logical subscriber — within a group, each message goes to only one consumer.

Producer → [Topic] → Consumer Group 1: [C1a, C1b] → Each message to one consumer
                  → Consumer Group 2: [C2a, C2b] → Each message to one consumer

Use cases: Multiple services processing the same stream of events independently.

Delivery Guarantees

Guarantee	Description	Risk
At-most-once	Message delivered zero or one times. No retries.	Message loss possible
At-least-once	Message delivered one or more times. Retries on failure.	Duplicate processing possible
Exactly-once	Message delivered exactly one time.	Hard to achieve; requires idempotency

Achieving Exactly-Once Semantics

True exactly-once delivery is extremely difficult. In practice, it's achieved via:

Idempotent consumers: Processing the same message twice produces the same result.

// Bad: total += amount (not idempotent)
// Good: SET order:123:status = "paid" (idempotent)

Deduplication: Track processed message IDs.

if message_id in processed_set:
    skip()
else:
    process(message)
    processed_set.add(message_id)

Transactional outbox: Write message and business data in the same DB transaction.

Message Ordering

Ordering	Description	Systems
No ordering	Messages may arrive in any order	Basic SQS
Partition ordering	Ordered within a partition/shard	Kafka (per partition), Kinesis
Total ordering	All messages globally ordered	Single partition, SQS FIFO

Kafka's Ordering Model

Topic: "orders" with 3 partitions

Partition 0: [msg1, msg4, msg7]  → Ordered within partition
Partition 1: [msg2, msg5, msg8]  → Ordered within partition
Partition 2: [msg3, msg6, msg9]  → Ordered within partition

Partition key: order_id → Same order always goes to same partition → Ordered per order

Popular Message Queue Technologies

Apache Kafka

A distributed event streaming platform designed for high-throughput, durable, ordered message processing.

Architecture:

Producers → [Kafka Broker Cluster] → Consumers
                    │
         ┌──────────┼──────────┐
         │          │          │
    [Broker 1] [Broker 2] [Broker 3]
         │          │          │
    Partitions  Partitions  Partitions

Key concepts:

Topics: Logical categories of messages.
Partitions: Topics are split into ordered, immutable logs.
Offsets: Each message has a sequential ID within its partition.
Consumer groups: Parallel consumption with load balancing.
Replication: Each partition has configurable replicas across brokers.
Retention: Messages persist for a configurable duration (not deleted on consumption).

Characteristics:
| Feature | Value |
|---------|-------|
| Throughput | Millions of messages/sec |
| Latency | Single-digit ms |
| Durability | Disk-based, replicated |
| Ordering | Per partition |
| Retention | Time-based or size-based (configurable) |
| Protocol | Custom TCP protocol |

Best for: Event streaming, log aggregation, real-time analytics, change data capture (CDC).

RabbitMQ

A traditional message broker implementing AMQP (Advanced Message Queuing Protocol).

Architecture:

Producer → [Exchange] → Binding → [Queue] → Consumer

Exchange types:
| Type | Routing |
|------|---------|
| Direct | Routes to queues with matching routing key |
| Fanout | Routes to all bound queues (broadcast) |
| Topic | Routes based on pattern matching (e.g., order.*) |
| Headers | Routes based on message header attributes |

Producer → [Topic Exchange]
               ├── order.created → Queue A (Order Service)
               ├── order.* → Queue B (Analytics Service)
               └── *.created → Queue C (Notification Service)

Characteristics:
| Feature | Value |
|---------|-------|
| Throughput | Tens of thousands/sec |
| Latency | Sub-ms |
| Durability | Optional (persistent or transient) |
| Ordering | Per queue (FIFO) |
| Protocol | AMQP, STOMP, MQTT |
| Acknowledgment | Per-message ACK/NACK |

Best for: Complex routing, task queues, RPC, traditional enterprise messaging.

Amazon SQS (Simple Queue Service)

A fully managed message queue service from AWS.

Types:
| Type | Throughput | Ordering | Deduplication |
|------|-----------|----------|---------------|
| Standard | Unlimited | Best-effort | At-least-once |
| FIFO | 300 msg/sec (3000 with batching) | Strict FIFO | Exactly-once |

Key features:

Fully managed (no servers to maintain).
Automatic scaling.
Dead-letter queues for failed messages.
Long polling to reduce empty responses.
Message visibility timeout (prevents duplicate processing).

Best for: AWS-native applications, simple decoupling, serverless architectures.

Amazon SNS (Simple Notification Service)

A fully managed pub/sub service.

Publisher → [SNS Topic] → SQS Queue
                       → Lambda Function
                       → HTTP Endpoint
                       → Email
                       → SMS

Often used with SQS in a fan-out pattern:

Event → [SNS Topic] → [SQS Queue 1] → Order Service
                    → [SQS Queue 2] → Notification Service
                    → [SQS Queue 3] → Analytics Service

Comparison

Feature	Kafka	RabbitMQ	SQS
Model	Distributed log	Message broker	Managed queue
Throughput	Very high (millions/sec)	Moderate (tens of thousands)	Scales automatically
Ordering	Per partition	Per queue	FIFO or best-effort
Retention	Configurable (days/weeks)	Until consumed	14 days max
Replay	Yes (re-read from any offset)	No (consumed = gone)	No
Routing	Topic + partition key	Exchanges + routing keys	Simple queue
Managed	Self-hosted or Confluent Cloud	Self-hosted or CloudAMQP	Fully managed (AWS)
Best for	Event streaming, analytics	Complex routing, RPC	Simple decoupling

Patterns

Dead Letter Queue (DLQ)

Messages that fail processing after retries are moved to a DLQ for investigation.

Main Queue → Consumer → Fails → Retry 1 → Fails → Retry 2 → Fails → DLQ

Outbox Pattern

Ensures messages are sent reliably alongside database transactions.

1. BEGIN transaction
2. INSERT INTO orders (...)
3. INSERT INTO outbox (message_payload, status='pending')
4. COMMIT transaction

Background process:
5. SELECT * FROM outbox WHERE status='pending'
6. Publish to message queue
7. UPDATE outbox SET status='sent'

Saga (Choreography)

Services communicate through events in a message queue to execute a distributed transaction.

Order Created → [Queue] → Payment Service → Payment Completed → [Queue] → Inventory Service → ...

Competing Consumers

Multiple consumers listen on the same queue for load balancing.

[Queue] → Consumer 1 (gets msg 1, 4, 7)
        → Consumer 2 (gets msg 2, 5, 8)
        → Consumer 3 (gets msg 3, 6, 9)

Priority Queue

Messages are processed based on priority, not just arrival order.

High priority:   [P1 Queue] → Consumer (processes first)
Normal priority: [P2 Queue] → Consumer (processes second)
Low priority:    [P3 Queue] → Consumer (processes last)

Backpressure

When consumers can't keep up with producers:

Strategy	Description
Drop messages	Oldest or lowest-priority messages are discarded
Block producer	Producer blocks until queue has space
Buffer to disk	Overflow messages go to disk (slower but durable)
Scale consumers	Auto-add more consumer instances
Rate limit producers	Slow down incoming message rate

Key Metrics to Monitor

Metric	What It Tells You
Queue depth	How many messages are waiting — growing = consumers too slow
Consumer lag	How far behind consumers are from latest messages
Processing rate	Messages consumed per second
Error rate	Failed message processing attempts
DLQ depth	Number of messages that couldn't be processed
Message age	How long messages wait before being consumed

Summary

Concept	Key Point
Purpose	Decouple services, absorb spikes, enable async processing
Models	Point-to-point (one consumer) vs Pub/Sub (all subscribers)
Delivery	At-least-once is the practical default; use idempotency for safety
Kafka	High-throughput event streaming with replay capability
RabbitMQ	Flexible routing with exchanges for complex patterns
SQS	Fully managed, simple, integrates with AWS ecosystem
Patterns	DLQ, Outbox, Saga, Competing Consumers

Rule of thumb: Use message queues whenever services don't need an immediate response, when you need to absorb traffic spikes, or when you want to decouple services for independent scaling and deployment.