Supported Models

alembica supports a variety of AI models from leading providers including OpenAI, GoogleAI, Cohere, Anthropic, DeepSeek, Perplexity, AWS Bedrock, Azure AI, Vertex AI, and self-hosted OpenAI-compatible endpoints.

The table below provides an overview of all supported models, organized by provider. For each model, you can find its maximum input token capacity and the cost per million input tokens. This information helps you select the appropriate model based on your context length requirements and budget considerations.

Each model has specific limits for input size and costs, as summarized below. Cloud and self-hosted providers use custom model IDs, and costs are not computed by alembica for those providers.

Self-Hosted (OpenAI-Compatible)

Local endpoints such as Ollama, vLLM, LM Studio, or LocalAI are supported via the SelfHosted provider. Model IDs and limits are defined by your local runtime, and costs are not computed.

OpenAI

Model	Maximum Input Tokens	Cost of 1M Input Tokens
GPT-5.2	400,000	$1.75
GPT-5.1	400,000	$1.25
GPT-5	400,000	$1.25
GPT-5 Mini	128,000	$0.25
GPT-5 Nano	128,000	$0.05
o4 Mini	200,000	$1.10
o3 Mini	200,000	$1.10
o3	200,000	$2.00
o1 Mini	128,000	$1.10
o1	200,000	$15.00
GPT-4.1 Nano	1,000,000	$0.10
GPT-4.1 Mini	1,000,000	$0.40
GPT-4.1	1,000,000	$2.00
GPT-4o Mini	128,000	$0.15
GPT-4o	128,000	$2.50
GPT-4 Turbo	128,000	$10.00
GPT-3.5 Turbo	16,385	$0.50

Anthropic

Model	Maximum Input Tokens	Cost of 1M Input Tokens
Claude 4.5 Opus	200,000	$15.00
Claude 4.5 Sonnet	200,000	$3.00
Claude 4.5 Haiku	200,000	$1.00
Claude 4.0 Opus	200,000	$15.00
Claude 4.0 Sonnet	200,000	$3.00
Claude 3.7 Sonnet	200,000	$3.00
Claude 3.5 Haiku	200,000	$0.80
Claude 3 Opus	200,000	$15.00
Claude 3.5 Sonnet	200,000	$3.00
Claude 3 Haiku	200,000	$0.25

Cohere

Model	Maximum Input Tokens	Cost of 1M Input Tokens
Command A	256,000	$2.5
Command A Reasoning	256,000	Free (preview)
Command R+	128,000	$2.50
Command R7B	128,000	$0.0375
Command R, August 2024	128,000	$0.15
Command R	128,000	$0.15
Command Light	4,096	$0.30
Command	4,096	$1.00

Perplexity

Perplexity Sonar models with real-time web search capabilities. OpenAI-compatible API.

Model	Maximum Input Tokens	Cost of 1M Input Tokens
Sonar	128,000	$1.00
Sonar Pro	128,000	$3.00
Sonar Reasoning Pro	128,000	$2.00
Sonar Deep Research	128,000	$2.00

DeepSeek

DeepSeek V3.2 models feature automatic context caching. Cache hit rate: $0.028/M tokens (90% cheaper). Cache miss rate shown below. Maximum output: 8K for chat, 64K for reasoner.

Model	Maximum Input Tokens	Cost of 1M Input Tokens
DeepSeek-V3.2-Chat	128,000	$0.28 (cache miss)
DeepSeek-V3.2-Reasoner	128,000	$0.28 (cache miss)

GoogleAI

Model	Maximum Input Tokens	Cost of 1M Input Tokens
Gemini 3 Pro Preview	1,048,576	$2.00 (≤200k), $4.00 (>200k)
Gemini 3 Flash Preview	1,048,576	$0.50
Gemini 2.5 Pro	1,048,576	$1.25 (≤200k), $2.50 (>200k)
Gemini 2.5 Flash	1,048,576	$0.30
Gemini 2.5 Flash-Lite	1,048,576	$0.10
Gemini 2.0 Flash Lite	1,048,576	$0.075
Gemini 2.0 Flash	1,048,576	$0.10
Gemini 1.5 Flash	1,048,576	$0.15
Gemini 1.5 Pro	2,097,152	$2.50
Gemini 1.0 Pro (deprecated)	32,760	$0.50

AWS Bedrock (Llama and other models)

AWS Bedrock models are addressed by their Bedrock model IDs. Costs and context limits vary by model and are not tracked by alembica.

Azure AI (Azure OpenAI / Model Catalog)

Azure deployments are addressed by your deployment name in the model field. Costs and limits vary by deployment and are not tracked by alembica.

Vertex AI (Model Garden)

Vertex AI models are addressed by their model IDs in the Model Garden. Costs and context limits vary by model and are not tracked by alembica.