Designing Rate Limiting: API Gateway vs ALB
When building a new system, it’s easy to focus on core functionality first—APIs, data models, and business logic—and leave concerns like rate limiting for later.
But rate limiting isn’t just a “nice to have.” It directly impacts system stability, cost control, and tenant isolation.
While designing a recent system, I had to decide early on:
Should I introduce Amazon API Gateway, or keep a simpler architecture and handle rate limiting another way?
The Baseline Architecture
The system is built around a fairly standard stack:
- Containerized backend services
- An AWS Application Load Balancer (ALB) handling routing
- A mix of transactional and search-heavy endpoints
At this stage, there’s no inherent rate limiting:
- The ALB does not enforce per-client limits
- The application layer does not yet apply throttling
So the question becomes: where should rate limiting live?
Option 1: API Gateway
API Gateway is the most comprehensive solution. It provides a wide range of features out of the box.
What API Gateway gives you
- Built-in rate limiting and throttling
Define burst and steady-state limits per route, API key, or tenant. - API key management
Issue and revoke keys, track usage, and associate requests with tenants. - Usage plans and quotas
Enforce request caps for different pricing tiers. - Request validation and transformation
Reject invalid requests before they reach your backend. - Caching
Reduce backend load for repeated queries. - Integrated WAF support
Add protection against common attack patterns.
Tradeoffs to consider
Despite its capabilities, API Gateway introduces meaningful tradeoffs:
- Additional latency
Each request incurs an extra ~10–30ms. - Cost
Pricing is request-based, which adds up at scale. - More complex architecture
The request path becomes: Client → API Gateway → ALB → ECS Or requires replacing the ALB entirely. - Operational overhead
More infrastructure to manage, monitor, and deploy. - Payload size limits
Can be restrictive for ingestion-heavy endpoints.
Option 2: Keep the ALB and Add Targeted Controls
The alternative is to keep the existing architecture and introduce rate limiting in more focused layers.
This approach separates concerns without introducing a full API management layer.
A Layered Approach to Rate Limiting
1. Infrastructure-level protection with AWS WAF
Attaching AWS WAF to the ALB provides a strong first line of defense:
- Rate-based rules (e.g., block IPs exceeding a threshold)
- IP allowlists and blocklists
- Managed rules for common vulnerabilities (SQL injection, XSS)
This handles coarse-grained protection, especially against abusive traffic.
2. Application-level rate limiting
At the application layer, rate limiting can be implemented using middleware backed by a shared store (e.g., Redis).
This enables:
- Per-tenant and per-user limits
- Endpoint-specific throttling
- Flexible policies aligned with product requirements
This is where fine-grained control lives.
Why Not Just Use API Gateway?
API Gateway solves many problems—but it also introduces a new layer of abstraction.
For this system, the requirements were relatively focused:
- Prevent abuse
- Protect shared resources
- Maintain flexibility in how limits are defined
A full API management layer would have solved these problems, but at the cost of:
- Additional latency
- Higher operational complexity
- Increased cost
Instead, a layered approach provides the necessary safeguards without over-engineering the system.
When API Gateway Makes More Sense
There are clear scenarios where API Gateway is the right choice:
- You need API key management and usage tracking
- You’re exposing a public or third-party API
- You want built-in monetization (quotas, usage plans)
- You prefer centralized request validation and transformation
In those cases, API Gateway becomes part of the product, not just infrastructure.
Takeaway
Designing rate limiting early forces you to think about how your system will behave under stress—not just when everything is working normally.
The key decision isn’t whether to add rate limiting—it’s where to put it.
In this case, the most effective approach was:
- Use WAF for coarse, infrastructure-level protection
- Use application-level logic for fine-grained control
- Avoid introducing unnecessary layers until they’re justified
Good system design isn’t about using the most powerful tool—it’s about using the right amount of system for the problem you have.