Skip to content

RateLimiter

The rate limiter pattern controls the rate of calls to a function using a token bucket algorithm. Unlike the bulkhead (which limits concurrent calls), the rate limiter limits calls per time period.

Concepts

The rate limiter uses a token bucket algorithm:

  • The bucket holds a maximum number of tokens (equal to max_calls)
  • Each call consumes one token
  • Tokens are refilled continuously at a rate of max_calls / period
  • If no tokens are available, the call is either rejected or waits
Token Bucket (max_calls=5, period=1.0)

Time 0.0s: [*] [*] [*] [*] [*]  (5 tokens)
Call 1:     [*] [*] [*] [*] [ ]  (4 tokens)
Call 2:     [*] [*] [*] [ ] [ ]  (3 tokens)
Call 3:     [*] [*] [ ] [ ] [ ]  (2 tokens)
Call 4:     [*] [ ] [ ] [ ] [ ]  (1 token)
Call 5:     [ ] [ ] [ ] [ ] [ ]  (0 tokens)
Call 6:     REJECTED! (or waits for refill)

Time 0.2s: [*] [ ] [ ] [ ] [ ]  (1 token refilled)
Call 6:     [ ] [ ] [ ] [ ] [ ]  (succeeds now)

Rate Limiter vs Bulkhead

Rate Limiter Bulkhead
Limits Calls per time window Concurrent calls
Use case API rate limits, throttling Connection pool protection
Example 100 requests/minute 10 concurrent requests
Algorithm Token bucket Semaphore

Configuration

from pyresilience import RateLimiterConfig

config = RateLimiterConfig(
    max_calls=10,    # 10 calls per period
    period=1.0,      # Per 1 second
    max_wait=0.0,    # Reject immediately if no tokens (0 = no waiting)
)
Parameter Type Default Description
max_calls int 10 Maximum number of calls allowed per period
period float 1.0 Time period in seconds
max_wait float 0.0 Maximum seconds to wait for a token. 0 means reject immediately.

Usage

Basic Rate Limiting

from pyresilience import resilient, RateLimiterConfig

# Allow 10 requests per second
@resilient(rate_limiter=RateLimiterConfig(max_calls=10, period=1.0))
def call_api(endpoint: str) -> dict:
    return requests.get(endpoint).json()

With Waiting

Allow callers to wait for a token instead of immediate rejection:

# 100 calls per minute, wait up to 5 seconds for a token
@resilient(rate_limiter=RateLimiterConfig(
    max_calls=100,
    period=60.0,
    max_wait=5.0,
))
def call_api() -> dict:
    return requests.get("https://api.example.com").json()

Async Rate Limiting

@resilient(rate_limiter=RateLimiterConfig(max_calls=50, period=1.0))
async def async_call() -> dict:
    async with aiohttp.ClientSession() as session:
        async with session.get("https://api.example.com") as resp:
            return await resp.json()

With Fallback

Return a queued/deferred response instead of rejecting:

from pyresilience import (
    resilient, RateLimiterConfig, FallbackConfig, RateLimitExceededError
)

@resilient(
    rate_limiter=RateLimiterConfig(max_calls=10, period=1.0),
    fallback=FallbackConfig(
        handler=lambda e: {"status": "rate_limited", "retry_after": 1},
        fallback_on=(RateLimitExceededError,),
    ),
)
def call_api() -> dict:
    return requests.get("https://api.example.com").json()

Combined with Circuit Breaker

@resilient(
    rate_limiter=RateLimiterConfig(max_calls=100, period=60.0),
    circuit_breaker=CircuitBreakerConfig(failure_threshold=5),
    retry=RetryConfig(max_attempts=3),
)
def robust_call() -> dict:
    return requests.get("https://api.example.com").json()

Events

Event When
EventType.RATE_LIMITED A call was rejected because the rate limit was exceeded

Exception

from pyresilience import RateLimitExceededError

try:
    result = my_function()
except RateLimitExceededError:
    # Rate limit exceeded, try again later
    time.sleep(1)

Direct Usage

Use the rate limiter without the decorator:

from pyresilience import RateLimiter, RateLimiterConfig

rl = RateLimiter(RateLimiterConfig(max_calls=10, period=1.0))

if rl.acquire():
    result = do_something()
else:
    print("Rate limit exceeded")

# Reset to full capacity
rl.reset()