Engineering

What We Found Auditing Our Own Model Catalog Against the Live Provider Docs

Saad KadriApril 25, 20267 min read

On April 25, 2026 we re-verified every model ID Consilium ships against each provider's own documentation page. The audit was prompted by a simple worry: catalogs go stale, and a stale ID becomes a 404 on the user's next debate. We expected to find small drift. We found three real bugs.

Bug 1 — xAI uses dashes, not dots

Our catalog listed grok-4.20. The xAI native API uses grok-4-20. The dot was an authoring typo carried forward through the catalog, the per-provider default model in xai_agent.py, the cost map in the deliberation graph, the cheap-variants table in the orchestrator, and the marketing pricing page. xAI's naming convention everywhere else (grok-4-1-fast-reasoning, grok-4-1-fast-non-reasoning, grok-code-fast-1) uses dashes — we just spelled the 4.20 model wrong from the start.

Fix was a global rename plus an alias entry mapping grok-4.20 → grok-4-20 so any externally-stored debate session that was created before the fix still resolves to a callable target.

Bug 2 — OpenRouter's free roster had rotated entirely

Consilium uses OpenRouter as the secondary free-tier fallback when Groq is unavailable. We had five free-tier IDs hard-coded:

meta-llama/llama-3.3-70b-instruct:free
google/gemma-2-9b-it:free
mistralai/mistral-7b-instruct:free
nvidia/nemotron-4-340b-instruct:free
qwen/qwen-2.5-72b-instruct:free

Fetching openrouter.ai/collections/free-models returned zero of those slugs. The free roster had moved entirely to a newer generation: Gemma 4, Qwen3 Coder, Nemotron 3 Super, Ling 2.6, GLM 4.5 Air. Every debate that hit the OpenRouter fallback path before April 25 was attempting to call a model OpenRouter no longer routes.

We replaced the catalog with five current free IDs, updated the free-tier resolver's tier-equivalence table (fast → google/gemma-4-26b-a4b-it:free, balanced → qwen/qwen3-coder:free, deep → nvidia/nemotron-3-super-120b-a12b:free), and aliased the five retired IDs forward.

Bug 3 — Moonshot's catalog was incomplete

We listed only kimi-k2.6, the latest flagship. Moonshot's K2 family is broader than that — the K2.6 quickstart page also lists kimi-k2.5, kimi-k2-thinking, kimi-k2-thinking-turbo, and kimi-k2-turbo-preview. All five are callable today and serve different cost/latency profiles. Our catalog now carries them all.

What didn't change

Anthropic, OpenAI, Google, and Groq came back clean. All four models in the Anthropic table (claude-opus-4-7, claude-sonnet-4-6, claude-opus-4-6, claude-haiku-4-5-20251001), all five OpenAI IDs (the full GPT-5.4 / GPT-5.5 family), all three current Gemini 3.x previews, and all six Groq production models matched their provider docs verbatim.

The deprecation calendar that came out of it

The same audit also surfaced six legacy model IDs scheduled for shutdown between June and October 2026 — including gemini-2.0-flash on June 1, claude-sonnet-4 and claude-opus-4 on June 15, and gemini-2.5-pro / gemini-2.5-flash on June 17. Each is aliased forward in our config so debates that request a soon-to-retire ID resolve to a current model automatically. Detail in the next post.

Why we publish the audit

Multi-provider catalogs rot. The seven-provider lineup we ship is the largest in this category, which means the surface area for rot is larger too. We commit the audit doc (docs/design/model-freshness-2026-04.md) into the repo so the next time someone wonders why we substituted qwen/qwen3-coder:free for qwen-2.5-72b-instruct:free, the answer with the verbatim provider URL is one git-blame away.

Try Consilium