## What is Rate Limiting?
Rate limiting restricts how many requests a user or IP address can make to your API within a timeframe. Once the limit is exceeded, additional requests are rejected until the window resets.
Think of it as a bouncer at a club counting how many times someone enters. After 100 entries in an hour, they stop letting that person in until the next hour.
## Why Rate Limiting Exists
**Prevent Abuse**: Without limits, one user could spam your API with millions of requests, crashing your servers or racking up huge costs.
**Fair Usage**: Ensure all users get reasonable access. Prevent one user from hogging all resources.
**Cost Control**: Cloud services charge per request. Rate limiting prevents astronomical bills from malicious or buggy clients.
**Security**: Slow down brute force attacks, credential stuffing, and automated scraping.
## How It Works
Set a limit: "100 requests per hour per user."
Track requests: Each time a user makes a request, increment their counter.
Enforce: When they hit 100, reject additional requests with "429 Too Many Requests" error.
Reset: After an hour, counter resets to zero.
## Rate Limiting Strategies
**Fixed Window**: 100 requests per hour, resets at the top of each hour. Simple but has edge cases (user makes 100 requests at 1:59, then 100 more at 2:01).
**Sliding Window**: Tracks requests over rolling 60-minute window. More accurate but complex to implement.
**Token Bucket**: Users get tokens that refill over time. Allows bursts while maintaining average rate.
**Leaky Bucket**: Requests processed at steady rate. Excess requests queue up or get rejected.
## Real-World Examples
**Twitter API**: 900 requests per 15 minutes for most endpoints. Exceed it and you wait.
**GitHub API**: 5,000 requests per hour for authenticated users, 60 per hour for unauthenticated. Clear, documented limits.
**Stripe**: Prevents brute forcing payment methods by limiting failed payment attempts.
**OpenAI API**: Rate limits prevent abuse while allowing legitimate usage.
## HTTP Status Code
When rate limited, APIs return **429 Too Many Requests** with headers showing:
- How many requests allowed
- How many remaining
- When the limit resets
```
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 23
X-RateLimit-Reset: 1640000000
```
Good clients respect these headers and slow down.
## Implementing Rate Limiting
**Redis**: Store counters in Redis. Fast, handles high traffic, expires keys automatically.
**API Gateway**: AWS API Gateway, Kong, Nginx have built-in rate limiting.
**Libraries**: Express-rate-limit (Node.js), Django-ratelimit (Python). Easy to add to existing apps.
Most developers use existing tools rather than building from scratch.
## Rate Limiting Strategies by Use Case
**Public API**: Generous limits for normal use, strict enough to prevent abuse.
**Internal API**: Higher limits since traffic is controlled and trusted.
**Authentication**: Very strict on login attempts (prevent brute force).
**Free vs Paid Tiers**: Free users get lower limits, paid users get higher. Common monetization strategy.
## User Experience
Communicate limits clearly in documentation. Show remaining requests in responses. Provide meaningful error messages when limited.
Bad: "Error 429"
Good: "Rate limit exceeded. You can make 100 requests per hour. Try again in 23 minutes."
## Handling Rate Limits as a Client
**Respect Limits**: Do not hammer APIs. Implement backoff strategies.
**Cache Responses**: Reduce requests by caching data locally.
**Batch Requests**: Some APIs allow requesting multiple resources in one call.
**Monitor Usage**: Track your request count to avoid hitting limits unexpectedly.
## The Balance
Too strict: Frustrates legitimate users.
Too lenient: Allows abuse and drives up costs.
Finding the right balance requires understanding your users and monitoring actual usage patterns.
Rate limiting is essential infrastructure for any production API. It protects your service, controls costs, and ensures fair access for all users.