Designing Rate Limiting: API Gateway vs ALB

By - The Libzter
Posted on March 4, 2026April 4, 2026
Posted in AI

Designing Rate Limiting: API Gateway vs ALB

When building a new system, it’s easy to focus on core functionality first—APIs, data models, and business logic—and leave concerns like rate limiting for later.

But rate limiting isn’t just a “nice to have.” It directly impacts system stability, cost control, and tenant isolation.

While designing a recent system, I had to decide early on:

Should I introduce Amazon API Gateway, or keep a simpler architecture and handle rate limiting another way?

The Baseline Architecture

The system is built around a fairly standard stack:

Containerized backend services
An AWS Application Load Balancer (ALB) handling routing
A mix of transactional and search-heavy endpoints

At this stage, there’s no inherent rate limiting:

The ALB does not enforce per-client limits
The application layer does not yet apply throttling

So the question becomes: where should rate limiting live?

Option 1: API Gateway

API Gateway is the most comprehensive solution. It provides a wide range of features out of the box.

What API Gateway gives you

Built-in rate limiting and throttling
Define burst and steady-state limits per route, API key, or tenant.
API key management
Issue and revoke keys, track usage, and associate requests with tenants.
Usage plans and quotas
Enforce request caps for different pricing tiers.
Request validation and transformation
Reject invalid requests before they reach your backend.
Caching
Reduce backend load for repeated queries.
Integrated WAF support
Add protection against common attack patterns.

Tradeoffs to consider

Despite its capabilities, API Gateway introduces meaningful tradeoffs:

Additional latency
Each request incurs an extra ~10–30ms.
Cost
Pricing is request-based, which adds up at scale.
More complex architecture
The request path becomes: Client → API Gateway → ALB → ECS Or requires replacing the ALB entirely.
Operational overhead
More infrastructure to manage, monitor, and deploy.
Payload size limits
Can be restrictive for ingestion-heavy endpoints.

Option 2: Keep the ALB and Add Targeted Controls

The alternative is to keep the existing architecture and introduce rate limiting in more focused layers.

This approach separates concerns without introducing a full API management layer.

A Layered Approach to Rate Limiting

1. Infrastructure-level protection with AWS WAF

Attaching AWS WAF to the ALB provides a strong first line of defense:

Rate-based rules (e.g., block IPs exceeding a threshold)
IP allowlists and blocklists
Managed rules for common vulnerabilities (SQL injection, XSS)

This handles coarse-grained protection, especially against abusive traffic.

2. Application-level rate limiting

At the application layer, rate limiting can be implemented using middleware backed by a shared store (e.g., Redis).

This enables:

Per-tenant and per-user limits
Endpoint-specific throttling
Flexible policies aligned with product requirements

This is where fine-grained control lives.

Why Not Just Use API Gateway?

API Gateway solves many problems—but it also introduces a new layer of abstraction.

For this system, the requirements were relatively focused:

Prevent abuse
Protect shared resources
Maintain flexibility in how limits are defined

A full API management layer would have solved these problems, but at the cost of:

Additional latency
Higher operational complexity
Increased cost

Instead, a layered approach provides the necessary safeguards without over-engineering the system.

When API Gateway Makes More Sense

There are clear scenarios where API Gateway is the right choice:

You need API key management and usage tracking
You’re exposing a public or third-party API
You want built-in monetization (quotas, usage plans)
You prefer centralized request validation and transformation

In those cases, API Gateway becomes part of the product, not just infrastructure.

Takeaway

Designing rate limiting early forces you to think about how your system will behave under stress—not just when everything is working normally.

The key decision isn’t whether to add rate limiting—it’s where to put it.

In this case, the most effective approach was:

Use WAF for coarse, infrastructure-level protection
Use application-level logic for fine-grained control
Avoid introducing unnecessary layers until they’re justified

Good system design isn’t about using the most powerful tool—it’s about using the right amount of system for the problem you have.

Libby Louis

Designing Rate Limiting: API Gateway vs ALB

The Baseline Architecture

Option 1: API Gateway

What API Gateway gives you

Tradeoffs to consider

Option 2: Keep the ALB and Add Targeted Controls

A Layered Approach to Rate Limiting

1. Infrastructure-level protection with AWS WAF

2. Application-level rate limiting

Why Not Just Use API Gateway?

When API Gateway Makes More Sense

Takeaway

Previous Article

Next Article