Concepts
Rate limits
API throttling, 429 behavior, and best practices for retries
All endpoints are protected by rate limiting.
Limits
- Global default: 3 requests/second per IP
- Some endpoints define stricter/explicit overrides
Example overrides:
| Endpoint | Limit |
|---|---|
POST /model-swap | 5/minute |
POST /flat-2-model | 5/minute |
POST /identity/upload | 5/minute |
POST /notifications | 12/minute |
PUT /webhooks | 10/minute |
POST /webhooks/test | 5/minute |
Rate-limit response headers
When a rate-limit is hit, responses include:
| Header | Description |
|---|---|
X-RateLimit-Limit | the total number of requests allowed for the current window and endpoint |
X-RateLimit-Remaining | the number of requests remaining in the current window |
X-RateLimit-Reset | the timestamp (epoch) at which the current window resets |
Retry-After | the time (in seconds) to wait before making a new request |
Handling 429 responses
Respect Retry-After and retry with backoff:
import time
import requests
def call_with_retry(url, headers, payload, retries=3):
for attempt in range(retries):
response = requests.post(url, headers=headers, json=payload)
if response.status_code != 429:
return response
wait_seconds = int(response.headers.get("Retry-After", 2 ** attempt))
# NOTE: this is an example. Do not sleep in production code!
# Use an async backoff strategy instead.
time.sleep(wait_seconds)
return responseNote on account caps
Some actions also have account-level limits (for example, the maximum number of identities you can create). These are separate from request-rate limits and depend on your plan.