Rate Limits
alembica supports management of rate limitations imposed by various AI service providers. Understanding these limits is essential for efficient application development and deployment.
Understanding Rate Limits
Rate limits control how frequently you can access AI models and how much data you can process in a given timeframe. These limits vary by provider and subscription tier, and typically include:
- RPM: Requests per minute
- RPD: Requests per day
- TPM: Tokens per minute
- TPD: Tokens per day
Exceeding these limits may result in request throttling or errors. alembica helps manage these constraints by providing appropriate fallback mechanisms and retry strategies.
Disclaimer: Daily limits (RPD and TPD) are not currently supported by alembica. Users are responsible for implementing and respecting these constraints on their own within their applications.
Cloud/local note: AWS Bedrock, Azure AI, Vertex AI, and SelfHosted deployments have provider-specific rate limits that are not documented here. Set tpm_limit and rpm_limit in your input JSON when you need client-side throttling.
Anthropic
(January 2026, Tier 1 users)
Anthropic uses a tiered usage system (Tier 1-4) where rate limits apply at the organization level across all models. The limits below represent Tier 1 thresholds. Higher tiers provide increased limits and are automatically granted based on cumulative API credit purchases and usage history.
Tier 1 Requirements:
- Credit Purchase: $5
- Maximum Spend Limit: $100/month
Tier 1 Rate Limits (applies to all Claude models):
- RPM (Requests Per Minute): 50
- ITPM (Input Tokens Per Minute): Varies by model class (see below)
- OTPM (Output Tokens Per Minute): Varies by model class (see below)
| Model Class | RPM | ITPM | OTPM |
|---|---|---|---|
| Opus 4.x (4.0, 4.5) | 50 | 20,000 | 8,000 |
| Sonnet 4.x (4.0, 4.5) | 50 | 20,000 | 8,000 |
| Haiku 4.5 | 50 | 25,000 | 10,000 |
| Claude 3.7 Sonnet | 50 | 40,000 | 8,000 |
| Claude 3.5 Sonnet | 50 | 40,000 | 8,000 |
| Claude 3.5 Haiku | 50 | 50,000 | 10,000 |
| Claude 3 Opus | 50 | 20,000 | 8,000 |
| Claude 3 Sonnet | 50 | 40,000 | 8,000 |
| Claude 3 Haiku | 50 | 50,000 | 10,000 |
Note: Only uncached input tokens and cache creation tokens count towards ITPM limits for most models. Cached tokens (cache reads) do not count, effectively allowing 5-10x higher throughput when using prompt caching. For detailed information about Anthropic’s tiered system, visit their official rate limits documentation.
Cohere
Cohere production keys have no limit, but trial keys are limited to 20 API calls per minute.
Perplexity
(January 2026, Tier 1 users)
Perplexity uses a tiered usage system (Tier 0-5) where rate limits increase based on cumulative API credit purchases. Tier 1 requires $50+ in lifetime purchases.
Tier 1 Rate Limits:
| Model | RPM |
|---|---|
| Sonar Deep Research | 10 |
| Sonar Reasoning Pro | 150 |
| Sonar Pro | 150 |
| Sonar | 150 |
Note: Tiers are based on cumulative purchases. Higher tiers (2-5) provide significantly increased rate limits. For detailed information, visit Perplexity’s rate limits documentation.
DeepSeek
DeepSeek does not impose rate limits.
GoogleAI
(May 2025)
Tier 1:
| Model | RPM | RPD | TPM |
|---|---|---|---|
| Gemini 2.0 Flash | 2,000 | - | 4,000,000 |
| Gemini 2.0 Flash Lite | 4,000 | - | 4,000,000 |
| Gemini 1.5 Flash | 2,000 | - | 4,000,000 |
| Gemini 1.5 Pro | 1,000 | - | 4,000,000 |
OpenAI
(May 2025, tier 1 users)
| Model | RPM | RPD | TPM | Batch Queue Limit |
|---|---|---|---|---|
| o4-mini | 500 | - | 200,000 | 2,000,000 |
| o3-mini | 500 | - | 200,000 | 2,000,000 |
| o3 | 500 | - | 30,000 | 90,000 |
| o1-mini | 500 | - | 200,000 | 2,000,000 |
| o1 | 500 | - | 30,000 | 90,000 |
| gpt-4.1-nano | 500 | - | 200,000 | 2,000,000 |
| gpt-4.1-mini | 500 | - | 200,000 | 2,000,000 |
| gpt-4.1 | 500 | - | 30,000 | 900,000 |
| gpt-4o | 500 | - | 30,000 | 90,000 |
| gpt-4o-mini | 500 | 10,000 | 200,000 | 2,000,000 |
| gpt-4-turbo | 500 | - | 30,000 | 90,000 |
| gpt-3.5-turbo | 500 | 10,000 | 200,000 | 2,000,000 |