Libby Louis

Rambling thoughts from a curious engineer

Designing Rate Limiting: API Gateway vs ALB

When building a new system, it’s easy to focus on core functionality first—APIs, data models, and business logic—and leave concerns like rate limiting for later.

But rate limiting isn’t just a “nice to have.” It directly impacts system stability, cost control, and tenant isolation.

While designing a recent system, I had to decide early on:

Should I introduce Amazon API Gateway, or keep a simpler architecture and handle rate limiting another way?


The Baseline Architecture

The system is built around a fairly standard stack:

  • Containerized backend services
  • An AWS Application Load Balancer (ALB) handling routing
  • A mix of transactional and search-heavy endpoints

At this stage, there’s no inherent rate limiting:

  • The ALB does not enforce per-client limits
  • The application layer does not yet apply throttling

So the question becomes: where should rate limiting live?


Option 1: API Gateway

API Gateway is the most comprehensive solution. It provides a wide range of features out of the box.

What API Gateway gives you

  • Built-in rate limiting and throttling
    Define burst and steady-state limits per route, API key, or tenant.
  • API key management
    Issue and revoke keys, track usage, and associate requests with tenants.
  • Usage plans and quotas
    Enforce request caps for different pricing tiers.
  • Request validation and transformation
    Reject invalid requests before they reach your backend.
  • Caching
    Reduce backend load for repeated queries.
  • Integrated WAF support
    Add protection against common attack patterns.

Tradeoffs to consider

Despite its capabilities, API Gateway introduces meaningful tradeoffs:

  • Additional latency
    Each request incurs an extra ~10–30ms.
  • Cost
    Pricing is request-based, which adds up at scale.
  • More complex architecture
    The request path becomes: Client → API Gateway → ALB → ECS Or requires replacing the ALB entirely.
  • Operational overhead
    More infrastructure to manage, monitor, and deploy.
  • Payload size limits
    Can be restrictive for ingestion-heavy endpoints.

Option 2: Keep the ALB and Add Targeted Controls

The alternative is to keep the existing architecture and introduce rate limiting in more focused layers.

This approach separates concerns without introducing a full API management layer.


A Layered Approach to Rate Limiting

1. Infrastructure-level protection with AWS WAF

Attaching AWS WAF to the ALB provides a strong first line of defense:

  • Rate-based rules (e.g., block IPs exceeding a threshold)
  • IP allowlists and blocklists
  • Managed rules for common vulnerabilities (SQL injection, XSS)

This handles coarse-grained protection, especially against abusive traffic.


2. Application-level rate limiting

At the application layer, rate limiting can be implemented using middleware backed by a shared store (e.g., Redis).

This enables:

  • Per-tenant and per-user limits
  • Endpoint-specific throttling
  • Flexible policies aligned with product requirements

This is where fine-grained control lives.


Why Not Just Use API Gateway?

API Gateway solves many problems—but it also introduces a new layer of abstraction.

For this system, the requirements were relatively focused:

  • Prevent abuse
  • Protect shared resources
  • Maintain flexibility in how limits are defined

A full API management layer would have solved these problems, but at the cost of:

  • Additional latency
  • Higher operational complexity
  • Increased cost

Instead, a layered approach provides the necessary safeguards without over-engineering the system.


When API Gateway Makes More Sense

There are clear scenarios where API Gateway is the right choice:

  • You need API key management and usage tracking
  • You’re exposing a public or third-party API
  • You want built-in monetization (quotas, usage plans)
  • You prefer centralized request validation and transformation

In those cases, API Gateway becomes part of the product, not just infrastructure.


Takeaway

Designing rate limiting early forces you to think about how your system will behave under stress—not just when everything is working normally.

The key decision isn’t whether to add rate limiting—it’s where to put it.

In this case, the most effective approach was:

  • Use WAF for coarse, infrastructure-level protection
  • Use application-level logic for fine-grained control
  • Avoid introducing unnecessary layers until they’re justified

Good system design isn’t about using the most powerful tool—it’s about using the right amount of system for the problem you have.