7. Message Queues & Asynchronous Processing

Message queues enable asynchronous communication between services by decoupling producers (senders) from consumers (receivers). They are essential for building scalable, resilient, and loosely coupled distributed systems.


Why Message Queues?

Synchronous Communication Problems

Client → Service A → Service B → Service C → Response
                                      ↑
                              If slow or down,
                              entire chain blocks

Asynchronous Communication with Queues

Client → Service A → [Message Queue] → Service B (processes later)
              ↓
        Immediate ACK

Benefits:

  • Decoupling: Producer doesn't need to know about the consumer.
  • Asynchronous processing: Producer doesn't wait for the consumer.
  • Load leveling: Queue absorbs traffic spikes; consumers process at their own pace.
  • Reliability: Messages persist even if consumers are temporarily down.
  • Scalability: Add more consumers independently to handle load.
  • Retry & fault tolerance: Failed messages can be retried.

Core Concepts

Message

The unit of data sent between producer and consumer.

{
  "id": "msg-123",
  "type": "order.created",
  "timestamp": "2025-01-15T10:30:00Z",
  "payload": {
    "orderId": "order-456",
    "userId": "user-789",
    "amount": 99.99
  }
}

Producer (Publisher)

The service that sends messages to the queue.

Consumer (Subscriber)

The service that receives and processes messages from the queue.

Queue / Topic

The channel through which messages flow.

Broker

The server that manages queues, routing, persistence, and delivery.


Messaging Models

1. Point-to-Point (Queue)

Each message is consumed by exactly one consumer. Multiple consumers can listen, but only one gets each message.

Producer → [Queue] → Consumer A  (gets message 1)
                  → Consumer B  (gets message 2)
                  → Consumer C  (gets message 3)

Use cases: Task distribution, job processing, order processing.

2. Publish-Subscribe (Pub/Sub)

Each message is delivered to all subscribers of a topic.

Producer → [Topic] → Subscriber A  (gets message)
                  → Subscriber B  (gets message)
                  → Subscriber C  (gets message)

Use cases: Event broadcasting, notifications, real-time updates.

3. Consumer Groups

A hybrid: a topic has multiple consumer groups. Each group acts as a single logical subscriber — within a group, each message goes to only one consumer.

Producer → [Topic] → Consumer Group 1: [C1a, C1b] → Each message to one consumer
                  → Consumer Group 2: [C2a, C2b] → Each message to one consumer

Use cases: Multiple services processing the same stream of events independently.


Delivery Guarantees

Guarantee Description Risk
At-most-once Message delivered zero or one times. No retries. Message loss possible
At-least-once Message delivered one or more times. Retries on failure. Duplicate processing possible
Exactly-once Message delivered exactly one time. Hard to achieve; requires idempotency

Achieving Exactly-Once Semantics

True exactly-once delivery is extremely difficult. In practice, it's achieved via:

  1. Idempotent consumers: Processing the same message twice produces the same result.

    // Bad: total += amount (not idempotent)
    // Good: SET order:123:status = "paid" (idempotent)
    
  2. Deduplication: Track processed message IDs.

    if message_id in processed_set:
        skip()
    else:
        process(message)
        processed_set.add(message_id)
    
  3. Transactional outbox: Write message and business data in the same DB transaction.


Message Ordering

Ordering Description Systems
No ordering Messages may arrive in any order Basic SQS
Partition ordering Ordered within a partition/shard Kafka (per partition), Kinesis
Total ordering All messages globally ordered Single partition, SQS FIFO

Kafka's Ordering Model

Topic: "orders" with 3 partitions

Partition 0: [msg1, msg4, msg7]  → Ordered within partition
Partition 1: [msg2, msg5, msg8]  → Ordered within partition
Partition 2: [msg3, msg6, msg9]  → Ordered within partition

Partition key: order_id → Same order always goes to same partition → Ordered per order

Apache Kafka

A distributed event streaming platform designed for high-throughput, durable, ordered message processing.

Architecture:

Producers → [Kafka Broker Cluster] → Consumers
                    │
         ┌──────────┼──────────┐
         │          │          │
    [Broker 1] [Broker 2] [Broker 3]
         │          │          │
    Partitions  Partitions  Partitions

Key concepts:

  • Topics: Logical categories of messages.
  • Partitions: Topics are split into ordered, immutable logs.
  • Offsets: Each message has a sequential ID within its partition.
  • Consumer groups: Parallel consumption with load balancing.
  • Replication: Each partition has configurable replicas across brokers.
  • Retention: Messages persist for a configurable duration (not deleted on consumption).

Characteristics:
| Feature | Value |
|---------|-------|
| Throughput | Millions of messages/sec |
| Latency | Single-digit ms |
| Durability | Disk-based, replicated |
| Ordering | Per partition |
| Retention | Time-based or size-based (configurable) |
| Protocol | Custom TCP protocol |

Best for: Event streaming, log aggregation, real-time analytics, change data capture (CDC).


RabbitMQ

A traditional message broker implementing AMQP (Advanced Message Queuing Protocol).

Architecture:

Producer → [Exchange] → Binding → [Queue] → Consumer

Exchange types:
| Type | Routing |
|------|---------|
| Direct | Routes to queues with matching routing key |
| Fanout | Routes to all bound queues (broadcast) |
| Topic | Routes based on pattern matching (e.g., order.*) |
| Headers | Routes based on message header attributes |

Producer → [Topic Exchange]
               ├── order.created → Queue A (Order Service)
               ├── order.* → Queue B (Analytics Service)
               └── *.created → Queue C (Notification Service)

Characteristics:
| Feature | Value |
|---------|-------|
| Throughput | Tens of thousands/sec |
| Latency | Sub-ms |
| Durability | Optional (persistent or transient) |
| Ordering | Per queue (FIFO) |
| Protocol | AMQP, STOMP, MQTT |
| Acknowledgment | Per-message ACK/NACK |

Best for: Complex routing, task queues, RPC, traditional enterprise messaging.


Amazon SQS (Simple Queue Service)

A fully managed message queue service from AWS.

Types:
| Type | Throughput | Ordering | Deduplication |
|------|-----------|----------|---------------|
| Standard | Unlimited | Best-effort | At-least-once |
| FIFO | 300 msg/sec (3000 with batching) | Strict FIFO | Exactly-once |

Key features:

  • Fully managed (no servers to maintain).
  • Automatic scaling.
  • Dead-letter queues for failed messages.
  • Long polling to reduce empty responses.
  • Message visibility timeout (prevents duplicate processing).

Best for: AWS-native applications, simple decoupling, serverless architectures.


Amazon SNS (Simple Notification Service)

A fully managed pub/sub service.

Publisher → [SNS Topic] → SQS Queue
                       → Lambda Function
                       → HTTP Endpoint
                       → Email
                       → SMS

Often used with SQS in a fan-out pattern:

Event → [SNS Topic] → [SQS Queue 1] → Order Service
                    → [SQS Queue 2] → Notification Service
                    → [SQS Queue 3] → Analytics Service

Comparison

Feature Kafka RabbitMQ SQS
Model Distributed log Message broker Managed queue
Throughput Very high (millions/sec) Moderate (tens of thousands) Scales automatically
Ordering Per partition Per queue FIFO or best-effort
Retention Configurable (days/weeks) Until consumed 14 days max
Replay Yes (re-read from any offset) No (consumed = gone) No
Routing Topic + partition key Exchanges + routing keys Simple queue
Managed Self-hosted or Confluent Cloud Self-hosted or CloudAMQP Fully managed (AWS)
Best for Event streaming, analytics Complex routing, RPC Simple decoupling

Patterns

Dead Letter Queue (DLQ)

Messages that fail processing after retries are moved to a DLQ for investigation.

Main Queue → Consumer → Fails → Retry 1 → Fails → Retry 2 → Fails → DLQ

Outbox Pattern

Ensures messages are sent reliably alongside database transactions.

1. BEGIN transaction
2. INSERT INTO orders (...)
3. INSERT INTO outbox (message_payload, status='pending')
4. COMMIT transaction

Background process:
5. SELECT * FROM outbox WHERE status='pending'
6. Publish to message queue
7. UPDATE outbox SET status='sent'

Saga (Choreography)

Services communicate through events in a message queue to execute a distributed transaction.

Order Created → [Queue] → Payment Service → Payment Completed → [Queue] → Inventory Service → ...

Competing Consumers

Multiple consumers listen on the same queue for load balancing.

[Queue] → Consumer 1 (gets msg 1, 4, 7)
        → Consumer 2 (gets msg 2, 5, 8)
        → Consumer 3 (gets msg 3, 6, 9)

Priority Queue

Messages are processed based on priority, not just arrival order.

High priority:   [P1 Queue] → Consumer (processes first)
Normal priority: [P2 Queue] → Consumer (processes second)
Low priority:    [P3 Queue] → Consumer (processes last)

Backpressure

When consumers can't keep up with producers:

Strategy Description
Drop messages Oldest or lowest-priority messages are discarded
Block producer Producer blocks until queue has space
Buffer to disk Overflow messages go to disk (slower but durable)
Scale consumers Auto-add more consumer instances
Rate limit producers Slow down incoming message rate

Key Metrics to Monitor

Metric What It Tells You
Queue depth How many messages are waiting — growing = consumers too slow
Consumer lag How far behind consumers are from latest messages
Processing rate Messages consumed per second
Error rate Failed message processing attempts
DLQ depth Number of messages that couldn't be processed
Message age How long messages wait before being consumed

Summary

Concept Key Point
Purpose Decouple services, absorb spikes, enable async processing
Models Point-to-point (one consumer) vs Pub/Sub (all subscribers)
Delivery At-least-once is the practical default; use idempotency for safety
Kafka High-throughput event streaming with replay capability
RabbitMQ Flexible routing with exchanges for complex patterns
SQS Fully managed, simple, integrates with AWS ecosystem
Patterns DLQ, Outbox, Saga, Competing Consumers

Rule of thumb: Use message queues whenever services don't need an immediate response, when you need to absorb traffic spikes, or when you want to decouple services for independent scaling and deployment.