Rate Limits

alembica supports management of rate limitations imposed by various AI service providers. Understanding these limits is essential for efficient application development and deployment.

Understanding Rate Limits

Rate limits control how frequently you can access AI models and how much data you can process in a given timeframe. These limits vary by provider and subscription tier, and typically include:

RPM: Requests per minute
RPD: Requests per day
TPM: Tokens per minute
TPD: Tokens per day

Exceeding these limits may result in request throttling or errors. alembica helps manage these constraints by providing appropriate fallback mechanisms and retry strategies.

Disclaimer: Daily limits (RPD and TPD) are not currently supported by alembica. Users are responsible for implementing and respecting these constraints on their own within their applications.

Anthropic

(May 2025, tier 1 users)

Model	RPM	TPM	TPD
Claude 4.0 Opus	50	20,000	1,000,000
Claude 4.0 Sonnet	50	20,000	1,000,000
Claude 3.7 Sonnet	50	20,000	1,000,000
Claude 3.5 Sonnet	50	40,000	1,000,000
Claude 3.5 Haiku	50	50,000	5,000,000
Claude 3 Opus	50	20,000	1,000,000
Claude 3 Sonnet	50	40,000	1,000,000
Claude 3 Haiku	50	50,000	5,000,000

Cohere

Cohere production keys have no limit, but trial keys are limited to 20 API calls per minute.

DeepSeek

DeepSeek does not impose rate limits.

GoogleAI

(May 2025)

Tier 1:

Model	RPM	RPD	TPM
Gemini 2.0 Flash	2,000	-	4,000,000
Gemini 2.0 Flash Lite	4,000	-	4,000,000
Gemini 1.5 Flash	2,000	-	4,000,000
Gemini 1.5 Pro	1,000	-	4,000,000

OpenAI

(May 2025, tier 1 users)

Model	RPM	RPD	TPM	Batch Queue Limit
o4-mini	500	-	200,000	2,000,000
o3-mini	500	-	200,000	2,000,000
o3	500	-	30,000	90,000
o1-mini	500	-	200,000	2,000,000
o1	500	-	30,000	90,000
gpt-4.1-nano	500	-	200,000	2,000,000
gpt-4.1-mini	500	-	200,000	2,000,000
gpt-4.1	500	-	30,000	900,000
gpt-4o	500	-	30,000	90,000
gpt-4o-mini	500	10,000	200,000	2,000,000
gpt-4-turbo	500	-	30,000	90,000
gpt-3.5-turbo	500	10,000	200,000	2,000,000