Structured Deliberation Between AI Models
Consilium is a multi-AI council CLI and platform where 7 LLM providers debate, critique, and synthesize a consensus answer across 8 deliberation modes. Free to start with BYOK or the Groq free tier. A single CLI replaces Claude Code, Cursor CLI, Gemini CLI, and Grok Build - with the unique addition of cross-model debate.
Not another orchestration tool. Consilium makes AI models argue, challenge, and synthesize - producing answers with tracked confidence, dissent, and audit trails.
Which deliberation modes are available?
Consilium ships eight modes - Quick, Council, Deep, Blind, Red Team, Jury, Market, Auto - each tuned for a different stakes profile. Quick (~15s) for sanity checks, Council (~45s) as the default, Deep (~90s) for complex stakes, Red Team for adversarial review.
Single round, fastest response. Best for simple questions needing a fast sanity check.
Multi-round deliberation between models. The default mode for most decisions.
Extended deliberation with sub-agent research for complex, high-stakes questions.
Model names hidden until scored. Eliminates brand bias from evaluation.
Adversarial assessment where models actively try to break each other's arguments.
Panel deliberation with structured voting. Models must reach consensus or declare dissent.
Prediction market style confidence aggregation. Models stake credibility on positions.
Automatically selects the best deliberation mode based on topic complexity.
How does a Consilium debate actually work?
Every debate moves through a structured six-phase process inspired by academic debate and jury systems: Propose, Challenge, Rebut, Evaluate, Vote, and Synthesize.
Propose
Each model independently analyzes the problem and presents its initial position.
Challenge
Models cross-examine each other, probing assumptions and identifying weaknesses.
Rebut
Models refine their positions based on challenges, strengthening or revising arguments.
Evaluate
A judge model assesses argument quality, evidence strength, and logical consistency.
Vote
Models cast confidence-weighted votes on the strongest positions.
Synthesize
A final synthesis integrates the best arguments into a single, rigorous answer.
Why is deliberation better than orchestration?
Orchestration runs models in parallel and picks the best. Deliberation makes them argue until the truth emerges. The table below shows the eight capabilities only deliberation provides.
| Capability | Deliberation | Orchestration |
|---|---|---|
| Multiple model perspectives | ||
| Models challenge each other | ||
| Structured argumentation | ||
| Dissent tracking | ||
| Confidence-weighted voting | ||
| Adversarial red-teaming | ||
| Blind evaluation mode | ||
| Audit trail of reasoning |
How do I install the Consilium CLI?
One npm command, then login and run your first debate. The CLI streams deliberations live, accepts file and diff context, supports 50+ chat REPL slash commands, and ships with feature parity to Claude Code, Cursor CLI, Gemini CLI, and Grok Build.
# 1. Install the CLI (one-liner)
curl -fsSL https://install.myconsilium.xyz | sh
# 2. Sign in (or run on the free tier with no key)
consilium login
# 3. Run your first debate
consilium debate "What's the best way to ship this feature?" \
--mode councilSDK Examples
How do I integrate Consilium into my stack?
Integrate deliberation in minutes via the Python SDK (consilium on PyPI), the TypeScript SDK (@myconsilium/sdk on npm), or the CLI. All three speak SSE for live streaming and share the same REST contract.
from consilium import ConsiliumClient, DeliberationMode
client = ConsiliumClient(
api_url="https://api.myconsilium.xyz",
api_key="your-key",
)
result = client.deliberate(
"Should we migrate to microservices?",
mode=DeliberationMode.COUNCIL,
models=["claude-sonnet-4-6",
"gpt-5.4", "gemini-3-flash-preview"],
)
print(result.golden_prompt)
print(result.confidence_scores)
print(result.dissent_report)Which LLM providers does Consilium support?
Consilium ships seven first-class adapters: Anthropic, OpenAI, Google, Groq, xAI, Moonshot, and OpenRouter. Bring your own API keys with zero markup, or start free using the Groq pool fallback.
Which models can sit on the council?
Mix any combination across providers. Models marked Free run on the no-key-required free tier.
Claude 4 family - strongest reasoning and synthesis.
- Claude Haiku 4.5
claude-haiku-4-5-20251001 - Claude Sonnet 4.6
claude-sonnet-4-6 - Claude Opus 4.6
claude-opus-4-6 - Claude Opus 4.7
claude-opus-4-7
GPT-5 series - fast, mini, and pro tiers.
- GPT-5.4 Nano
gpt-5.4-nano - GPT-5.4 Mini
gpt-5.4-mini - GPT-5.4
gpt-5.4 - GPT-5.5
gpt-5.5 - GPT-5.5 Pro
gpt-5.5-pro
Gemini 3 - long context and fast multimodal.
- Gemini 3.1 Flash-Lite
gemini-3.1-flash-lite-preview - Gemini 3 Flash
gemini-3-flash-preview - Gemini 3.1 Pro
gemini-3.1-pro-preview
Sub-second inference. Free tier available.
- FreeLlama 3.1 8B Instant
llama-3.1-8b-instant - FreeLlama 3.3 70B Versatile
llama-3.3-70b-versatile - FreeGPT-OSS 120B (via Groq)
openai/gpt-oss-120b - FreeGPT-OSS 20B (via Groq)
openai/gpt-oss-20b - Groq Compound
groq/compound
Grok 4 - code-focused and reasoning variants.
- Grok Code Fast
grok-code-fast-1 - Grok 4.1 Fast (non-reasoning)
grok-4-1-fast-non-reasoning - Grok 4.1 Fast (reasoning)
grok-4-1-fast-reasoning - Grok 4.20
grok-4.20
Kimi K2 - long-context reasoning.
- Kimi K2.6
kimi-k2.6
No key, no problem. Start a debate with zero setup - Consilium routes free-tier requests through Groq and OpenRouter automatically. Bring your own keys anytime for premium models.
What research backs multi-agent deliberation?
Consilium's deliberation approach is grounded in peer-reviewed research from ICML 2024 and AAAI 2024, which together report 8-15% improvements in factual accuracy and reasoning over single-model prompting.
Akbir Khan et al. - ICML 2024
AI debate produces more truthful answers than single-model prompting, even when one debater argues for the wrong answer.
Yilun Du et al. - ICML 2024
Multi-agent debate significantly improves factual accuracy and mathematical reasoning across multiple benchmarks.
Li et al. - AAAI 2024
Structured discussion between LLMs produces more creative and diverse outputs than individual generation.
Irving et al. - AI Safety Research
Debate between AI systems provides a scalable mechanism for aligning AI behavior with human values.
How does pricing work? BYOK with zero markup.
Bring your own provider keys, pay your provider directly, and Consilium adds zero markup. Keys are AES-256-GCM encrypted at rest. Don't have a key? The Groq + OpenRouter free-tier pool covers up to 1,000 deliberations per month.