Rate Limits

alembica supports management of rate limitations imposed by various AI service providers. Understanding these limits is essential for efficient application development and deployment.

Understanding Rate Limits

Rate limits control how frequently you can access AI models and how much data you can process in a given timeframe. These limits vary by provider and subscription tier, and typically include:

Exceeding these limits may result in request throttling or errors. alembica helps manage these constraints by providing appropriate fallback mechanisms and retry strategies.

Disclaimer: Daily limits (RPD and TPD) are not currently supported by alembica. Users are responsible for implementing and respecting these constraints on their own within their applications.

Cloud/local note: AWS Bedrock, Azure AI, Vertex AI, and SelfHosted deployments have provider-specific rate limits that are not documented here. Set tpm_limit and rpm_limit in your input JSON when you need client-side throttling.

Anthropic

(January 2026, Tier 1 users)

Anthropic uses a tiered usage system (Tier 1-4) where rate limits apply at the organization level across all models. The limits below represent Tier 1 thresholds. Higher tiers provide increased limits and are automatically granted based on cumulative API credit purchases and usage history.

Tier 1 Requirements:

Tier 1 Rate Limits (applies to all Claude models):

Model Class RPM ITPM OTPM
Opus 4.x (4.0, 4.5) 50 20,000 8,000
Sonnet 4.x (4.0, 4.5) 50 20,000 8,000
Haiku 4.5 50 25,000 10,000
Claude 3.7 Sonnet 50 40,000 8,000
Claude 3.5 Sonnet 50 40,000 8,000
Claude 3.5 Haiku 50 50,000 10,000
Claude 3 Opus 50 20,000 8,000
Claude 3 Sonnet 50 40,000 8,000
Claude 3 Haiku 50 50,000 10,000

Note: Only uncached input tokens and cache creation tokens count towards ITPM limits for most models. Cached tokens (cache reads) do not count, effectively allowing 5-10x higher throughput when using prompt caching. For detailed information about Anthropic’s tiered system, visit their official rate limits documentation.

Cohere

Cohere production keys have no limit, but trial keys are limited to 20 API calls per minute.

Perplexity

(January 2026, Tier 1 users)

Perplexity uses a tiered usage system (Tier 0-5) where rate limits increase based on cumulative API credit purchases. Tier 1 requires $50+ in lifetime purchases.

Tier 1 Rate Limits:

Model RPM
Sonar Deep Research 10
Sonar Reasoning Pro 150
Sonar Pro 150
Sonar 150

Note: Tiers are based on cumulative purchases. Higher tiers (2-5) provide significantly increased rate limits. For detailed information, visit Perplexity’s rate limits documentation.

DeepSeek

DeepSeek does not impose rate limits.

GoogleAI

(May 2025)

Tier 1:

Model RPM RPD TPM
Gemini 2.0 Flash 2,000 - 4,000,000
Gemini 2.0 Flash Lite 4,000 - 4,000,000
Gemini 1.5 Flash 2,000 - 4,000,000
Gemini 1.5 Pro 1,000 - 4,000,000

OpenAI

(May 2025, tier 1 users)

Model RPM RPD TPM Batch Queue Limit
o4-mini 500 - 200,000 2,000,000
o3-mini 500 - 200,000 2,000,000
o3 500 - 30,000 90,000
o1-mini 500 - 200,000 2,000,000
o1 500 - 30,000 90,000
gpt-4.1-nano 500 - 200,000 2,000,000
gpt-4.1-mini 500 - 200,000 2,000,000
gpt-4.1 500 - 30,000 900,000
gpt-4o 500 - 30,000 90,000
gpt-4o-mini 500 10,000 200,000 2,000,000
gpt-4-turbo 500 - 30,000 90,000
gpt-3.5-turbo 500 10,000 200,000 2,000,000