Supported Models

alembica supports a variety of AI models from leading providers including OpenAI, GoogleAI, Cohere, Anthropic, DeepSeek, Perplexity, AWS Bedrock, Azure AI, Vertex AI, and self-hosted OpenAI-compatible endpoints.

The table below provides an overview of all supported models, organized by provider. For each model, you can find its maximum input token capacity and the cost per million input tokens. This information helps you select the appropriate model based on your context length requirements and budget considerations.

Each model has specific limits for input size and costs, as summarized below. Cloud and self-hosted providers use custom model IDs, and costs are not computed by alembica for those providers.

Self-Hosted (OpenAI-Compatible)

Local endpoints such as Ollama, vLLM, LM Studio, or LocalAI are supported via the SelfHosted provider. Model IDs and limits are defined by your local runtime, and costs are not computed.

OpenAI

Model Maximum Input Tokens Cost of 1M Input Tokens
GPT-5.2 400,000 $1.75
GPT-5.1 400,000 $1.25
GPT-5 400,000 $1.25
GPT-5 Mini 128,000 $0.25
GPT-5 Nano 128,000 $0.05
o4 Mini 200,000 $1.10
o3 Mini 200,000 $1.10
o3 200,000 $2.00
o1 Mini 128,000 $1.10
o1 200,000 $15.00
GPT-4.1 Nano 1,000,000 $0.10
GPT-4.1 Mini 1,000,000 $0.40
GPT-4.1 1,000,000 $2.00
GPT-4o Mini 128,000 $0.15
GPT-4o 128,000 $2.50
GPT-4 Turbo 128,000 $10.00
GPT-3.5 Turbo 16,385 $0.50

Anthropic

Model Maximum Input Tokens Cost of 1M Input Tokens
Claude 4.5 Opus 200,000 $15.00
Claude 4.5 Sonnet 200,000 $3.00
Claude 4.5 Haiku 200,000 $1.00
Claude 4.0 Opus 200,000 $15.00
Claude 4.0 Sonnet 200,000 $3.00
Claude 3.7 Sonnet 200,000 $3.00
Claude 3.5 Haiku 200,000 $0.80
Claude 3 Opus 200,000 $15.00
Claude 3.5 Sonnet 200,000 $3.00
Claude 3 Haiku 200,000 $0.25

Cohere

Model Maximum Input Tokens Cost of 1M Input Tokens
Command A 256,000 $2.5
Command A Reasoning 256,000 Free (preview)
Command R+ 128,000 $2.50
Command R7B 128,000 $0.0375
Command R, August 2024 128,000 $0.15
Command R 128,000 $0.15
Command Light 4,096 $0.30
Command 4,096 $1.00

Perplexity

Perplexity Sonar models with real-time web search capabilities. OpenAI-compatible API.

Model Maximum Input Tokens Cost of 1M Input Tokens
Sonar 128,000 $1.00
Sonar Pro 128,000 $3.00
Sonar Reasoning Pro 128,000 $2.00
Sonar Deep Research 128,000 $2.00

DeepSeek

DeepSeek V3.2 models feature automatic context caching. Cache hit rate: $0.028/M tokens (90% cheaper). Cache miss rate shown below. Maximum output: 8K for chat, 64K for reasoner.

Model Maximum Input Tokens Cost of 1M Input Tokens
DeepSeek-V3.2-Chat 128,000 $0.28 (cache miss)
DeepSeek-V3.2-Reasoner 128,000 $0.28 (cache miss)

GoogleAI

Model Maximum Input Tokens Cost of 1M Input Tokens
Gemini 3 Pro Preview 1,048,576 $2.00 (≤200k), $4.00 (>200k)
Gemini 3 Flash Preview 1,048,576 $0.50
Gemini 2.5 Pro 1,048,576 $1.25 (≤200k), $2.50 (>200k)
Gemini 2.5 Flash 1,048,576 $0.30
Gemini 2.5 Flash-Lite 1,048,576 $0.10
Gemini 2.0 Flash Lite 1,048,576 $0.075
Gemini 2.0 Flash 1,048,576 $0.10
Gemini 1.5 Flash 1,048,576 $0.15
Gemini 1.5 Pro 2,097,152 $2.50
Gemini 1.0 Pro (deprecated) 32,760 $0.50

AWS Bedrock (Llama and other models)

AWS Bedrock models are addressed by their Bedrock model IDs. Costs and context limits vary by model and are not tracked by alembica.

Azure AI (Azure OpenAI / Model Catalog)

Azure deployments are addressed by your deployment name in the model field. Costs and limits vary by deployment and are not tracked by alembica.

Vertex AI (Model Garden)

Vertex AI models are addressed by their model IDs in the Model Garden. Costs and context limits vary by model and are not tracked by alembica.