AI Providers & Models

7 providers, 25+ models, from free to frontier. Bring your own keys and mix models from different providers in the same deliberation — or run on Consilium's free-tier pool when you don't have keys.

All Models & Pricing

Prices are per 1 million tokens, charged by the provider (not Consilium). Consilium is BYOK — you pay providers directly through your own API keys.

Provider	Model	Model ID	Input/1M	Output/1M	Tier
Anthropic	Claude Opus 4.7	claude-opus-4-7	$5.00	$25.00	Most Capable
Anthropic	Claude Opus 4.6	claude-opus-4-6	$5.00	$25.00	Previous Flagship
Anthropic	Claude Sonnet 4.6	claude-sonnet-4-6	$3.00	$15.00	Balanced
Anthropic	Claude Haiku 4.5	claude-haiku-4-5-20251001	$1.00	$5.00	Fast
OpenAI	GPT-5.5 Pro	gpt-5.5-pro	$30.00	$180.00	Most Capable
OpenAI	GPT-5.5	gpt-5.5	$5.00	$30.00	Flagship
OpenAI	GPT-5.4	gpt-5.4	$2.00	$8.00	Reasoning
OpenAI	GPT-5.4 Mini	gpt-5.4-mini	$0.20	$0.80	Cost-effective
OpenAI	GPT-5.4 Nano	gpt-5.4-nano	$0.08	$0.30	Lowest Cost
Google	Gemini 3.1 Pro	gemini-3.1-pro-preview	$1.25	$5.00	Most Capable
Google	Gemini 3 Flash	gemini-3-flash-preview	$0.15	$0.60	Balanced
Google	Gemini 3.1 Flash-Lite	gemini-3.1-flash-lite-preview	$0.05	$0.20	Lowest Cost
Groq	Llama 3.1 8B	llama-3.1-8b-instant	Free	Free	Instant
Groq	Llama 3.3 70B	llama-3.3-70b-versatile	Free	Free	Versatile
Groq	GPT-OSS 120B	openai/gpt-oss-120b	Free	Free	Open-Weight Flagship
Groq	GPT-OSS 20B	openai/gpt-oss-20b	Free	Free	Open-Weight
Groq	Groq Compound	groq/compound	$0.80	$1.60	Agentic
xAI	Grok 4.20	grok-4-20	$3.00	$15.00	Most Capable
xAI	Grok 4.1 Fast (reasoning)	grok-4-1-fast-reasoning	$1.00	$4.00	Reasoning
xAI	Grok 4.1 Fast	grok-4-1-fast-non-reasoning	$0.50	$2.00	Fast
xAI	Grok Code Fast	grok-code-fast-1	$0.30	$1.20	Coding
Moonshot	Kimi K2.6	kimi-k2.6	$1.20	$2.50	1T Open-Source
OpenRouter	Gemma 4 26B (free)	google/gemma-4-26b-a4b-it:free	Free	Free	Free Tier
OpenRouter	Gemma 4 31B (free)	google/gemma-4-31b-it:free	Free	Free	Free Tier
OpenRouter	Qwen3 Coder (free)	qwen/qwen3-coder:free	Free	Free	Free Tier
OpenRouter	Nemotron 3 Super 120B (free)	nvidia/nemotron-3-super-120b-a12b:free	Free	Free	Free Tier
OpenRouter	Ling 2.6 1T (free)	inclusionai/ling-2.6-1t:free	Free	Free	Free Tier

Provider Details

Anthropic

Judge Priority #1

Claude models excel at nuanced reasoning, following complex instructions, and agentic coding. Claude Opus 4.7 is Anthropic's most capable generally available model with a step-change improvement in agentic coding over Opus 4.6. Claude Sonnet 4.6 offers the best balance of speed and intelligence with a 1M-token context. Claude Haiku 4.5 is the fastest model with near-frontier intelligence.

Environment Variable

ANTHROPIC_API_KEY

API Client

anthropic.AsyncAnthropic

Strengths

Nuanced reasoning, instruction following, agentic coding, 1M context, structured output

OpenAI

Judge Priority #3

GPT-5.5 and GPT-5.5 Pro are OpenAI's latest flagship models with 1M-token context windows, available through the Responses and Chat Completions APIs. GPT-5.4 (and the Mini/Nano variants) remain the cost-tier options for high-volume workloads. Retired models like GPT-4o, GPT-4.1, and o4-mini are no longer supported by OpenAI as of February 2026.

Environment Variable

OPENAI_API_KEY

API Client

openai.AsyncOpenAI (httpx.AsyncClient)

Strengths

General capability, code generation, agentic tool use, multimodal, 1M context

Google

Judge Priority #2

Gemini 3.1 Pro is Google's most advanced reasoning model, optimized for complex agentic workflows and coding. Gemini 3 Flash provides strong frontier-class performance at low cost, while Gemini 3.1 Flash-Lite is the cheapest option for high-volume, latency-sensitive traffic. Gemini 1.x and 2.x lines have been deprecated.

Environment Variable

GOOGLE_API_KEY

API Client

google.generativeai.GenerativeModel

Strengths

Cost efficiency, multimodal, long context, frontier reasoning

Groq

Judge Priority #6

Groq provides ultra-fast inference for open-weight models including Llama 3.1 8B, Llama 3.3 70B, and OpenAI's GPT-OSS 120B/20B at zero cost through their free tier. Groq Compound and Compound Mini are agentic systems with built-in web search and code execution. Consilium uses Groq as the primary platform free-tier fallback (CONSILIUM_FREE_TIER_GROQ_KEY) when no BYOK key is configured.

Environment Variable

GROQ_API_KEY

API Client

OpenAI-compatible (api.groq.com/openai/v1)

Strengths

Free open-weight models, fastest inference, agentic compound systems

xAI

Judge Priority #4

xAI's Grok lineup launched Grok 4.20 in February 2026 with a four-agent architecture for reasoning. Grok 4.1 Fast (reasoning + non-reasoning variants) and Grok Code Fast cover the lower-cost tier. Use the OpenAI-compatible API format. Grok 2/2-mini and grok-beta are legacy and have been migrated.

Environment Variable

XAI_API_KEY

API Client

OpenAI-compatible (api.x.ai/v1)

Strengths

Multi-agent reasoning, real-time knowledge, fast coding tasks

Moonshot

Judge Priority #5

Moonshot's Kimi K2.6 (released April 2026) is a frontier-scale 1T-parameter open-source MoE model with a 262k context window, multi-turn tool calling, vision inputs, and structured outputs for agentic workloads. The API is OpenAI-compatible.

Environment Variable

MOONSHOT_API_KEY

API Client

OpenAI-compatible (platform.moonshot.ai)

Strengths

1T parameters, agentic tool use, long-context coding stability

OpenRouter

Judge Priority #7

OpenRouter aggregates access to dozens of models behind one OpenAI-compatible endpoint, including a free tier for popular community models like Gemma 4, Qwen3 Coder, Nemotron 3 Super 120B, and Ling 2.6 1T (rate-limited at 20 req/min, 50 req/day per OpenRouter's April 2026 free-tier policy). Consilium uses OpenRouter as the secondary free-tier fallback (CONSILIUM_FREE_TIER_OPENROUTER_KEY) when Groq is unavailable.

Environment Variable

OPENROUTER_API_KEY

API Client

OpenAI-compatible (openrouter.ai/api/v1)

Strengths

Free tier breadth, single endpoint for many providers, easy fallback

Judge Model Priority

The judge model evaluates proposals and produces the final synthesis. Consilium selects the judge based on this priority order (using the first provider for which you have a valid key):

#1 Anthropic→#2 Google→#3 OpenAI→#4 xAI→#5 Moonshot→#6 Groq→#7 OpenRouter

Free-Tier Fallback (BYOK Preserved)

Consilium is BYOK-first. When you supply your own provider API key, that key is always used — no fallback occurs. When no key is set for the requested provider, Consilium routes through a platform-hosted free-tier pool so you can keep working at zero cost. Resolution order:

Your BYOK key for the requested provider (always wins)
Self-hosted env var (e.g. OPENAI_API_KEY)
Groq free-tier pool — CONSILIUM_FREE_TIER_GROQ_KEY
OpenRouter free-tier pool — CONSILIUM_FREE_TIER_OPENROUTER_KEY

Tier is inferred from the requested model's catalog cost (fast / balanced / deep) and routed to a tier-equivalent free model. The CLI prints a pre-flight notice and a routing:fallback SSE event is emitted so you always know when fallback is active.

Tier-equivalent free models

Groq (preferred)

llama-3.1-8b-instant (fast)

llama-3.3-70b-versatile (balanced)

openai/gpt-oss-120b (deep)

OpenRouter (backup)

gemma-2-9b-it:free (fast)

llama-3.3-70b-instruct:free (balanced)

qwen-2.5-72b-instruct:free (deep)

Context Overflow Fallback

When a model returns a 413 or 400 error indicating the context is too large, Consilium automatically retries with a cheaper, smaller-context variant:

Original Model	Fallback Model
gpt-5.5-pro	gpt-5.4-mini
gpt-5.5	gpt-5.4-mini
claude-opus-4-7	claude-haiku-4-5-20251001
claude-sonnet-4-6	claude-haiku-4-5-20251001
gemini-3.1-pro-preview	gemini-3-flash-preview
grok-4-20	grok-4-1-fast-non-reasoning

Security

All API keys are encrypted with AES-256-GCM before storage. Keys are never stored in plaintext, never logged, and never transmitted to any third party.