Structured Deliberation Between AI Models

Consilium is a multi-AI council CLI and platform where 7 LLM providers debate, critique, and synthesize a consensus answer across 8 deliberation modes. Free to start with BYOK or the Groq free tier. A single CLI replaces Claude Code, Cursor CLI, Gemini CLI, and Grok Build - with the unique addition of cross-model debate.

Not another orchestration tool. Consilium makes AI models argue, challenge, and synthesize - producing answers with tracked confidence, dissent, and audit trails.

Which deliberation modes are available?

Consilium ships eight modes - Quick, Council, Deep, Blind, Red Team, Jury, Market, Auto - each tuned for a different stakes profile. Quick (~15s) for sanity checks, Council (~45s) as the default, Deep (~90s) for complex stakes, Red Team for adversarial review.

Quick
~15s

Single round, fastest response. Best for simple questions needing a fast sanity check.

Council
~45s

Multi-round deliberation between models. The default mode for most decisions.

Deep
~90s

Extended deliberation with sub-agent research for complex, high-stakes questions.

Blind
~45s

Model names hidden until scored. Eliminates brand bias from evaluation.

Red Team
~120s

Adversarial assessment where models actively try to break each other's arguments.

Jury
~60s

Panel deliberation with structured voting. Models must reach consensus or declare dissent.

Market
~90s

Prediction market style confidence aggregation. Models stake credibility on positions.

Auto
~45s

Automatically selects the best deliberation mode based on topic complexity.

How does a Consilium debate actually work?

Every debate moves through a structured six-phase process inspired by academic debate and jury systems: Propose, Challenge, Rebut, Evaluate, Vote, and Synthesize.

1

Propose

Each model independently analyzes the problem and presents its initial position.

2

Challenge

Models cross-examine each other, probing assumptions and identifying weaknesses.

3

Rebut

Models refine their positions based on challenges, strengthening or revising arguments.

4

Evaluate

A judge model assesses argument quality, evidence strength, and logical consistency.

5

Vote

Models cast confidence-weighted votes on the strongest positions.

6

Synthesize

A final synthesis integrates the best arguments into a single, rigorous answer.

Why is deliberation better than orchestration?

Orchestration runs models in parallel and picks the best. Deliberation makes them argue until the truth emerges. The table below shows the eight capabilities only deliberation provides.

CapabilityDeliberationOrchestration
Multiple model perspectives
Models challenge each other
Structured argumentation
Dissent tracking
Confidence-weighted voting
Adversarial red-teaming
Blind evaluation mode
Audit trail of reasoning
One command to get started

How do I install the Consilium CLI?

One npm command, then login and run your first debate. The CLI streams deliberations live, accepts file and diff context, supports 50+ chat REPL slash commands, and ships with feature parity to Claude Code, Cursor CLI, Gemini CLI, and Grok Build.

# 1. Install the CLI (one-liner)
curl -fsSL https://install.myconsilium.xyz | sh

# 2. Sign in (or run on the free tier with no key)
consilium login

# 3. Run your first debate
consilium debate "What's the best way to ship this feature?" \
  --mode council

SDK Examples

How do I integrate Consilium into my stack?

Integrate deliberation in minutes via the Python SDK (consilium on PyPI), the TypeScript SDK (@myconsilium/sdk on npm), or the CLI. All three speak SSE for live streaming and share the same REST contract.

from consilium import ConsiliumClient, DeliberationMode

client = ConsiliumClient(
    api_url="https://api.myconsilium.xyz",
    api_key="your-key",
)

result = client.deliberate(
    "Should we migrate to microservices?",
    mode=DeliberationMode.COUNCIL,
    models=["claude-sonnet-4-6",
            "gpt-5.4", "gemini-3-flash-preview"],
)

print(result.golden_prompt)
print(result.confidence_scores)
print(result.dissent_report)

Which LLM providers does Consilium support?

Consilium ships seven first-class adapters: Anthropic, OpenAI, Google, Groq, xAI, Moonshot, and OpenRouter. Bring your own API keys with zero markup, or start free using the Groq pool fallback.

AnthropicAnthropic
OpenAIOpenAI
GoogleGoogle
GroqGroq
xAIxAI
MoonshotMoonshot
OpenRouterOpenRouter
Available in the CLI and Web app

Which models can sit on the council?

Mix any combination across providers. Models marked Free run on the no-key-required free tier.

AnthropicAnthropic

Claude 4 family - strongest reasoning and synthesis.

  • Claude Haiku 4.5
    claude-haiku-4-5-20251001
  • Claude Sonnet 4.6
    claude-sonnet-4-6
  • Claude Opus 4.6
    claude-opus-4-6
  • Claude Opus 4.7
    claude-opus-4-7
OpenAIOpenAI

GPT-5 series - fast, mini, and pro tiers.

  • GPT-5.4 Nano
    gpt-5.4-nano
  • GPT-5.4 Mini
    gpt-5.4-mini
  • GPT-5.4
    gpt-5.4
  • GPT-5.5
    gpt-5.5
  • GPT-5.5 Pro
    gpt-5.5-pro
GoogleGoogle

Gemini 3 - long context and fast multimodal.

  • Gemini 3.1 Flash-Lite
    gemini-3.1-flash-lite-preview
  • Gemini 3 Flash
    gemini-3-flash-preview
  • Gemini 3.1 Pro
    gemini-3.1-pro-preview
GroqGroq

Sub-second inference. Free tier available.

  • Llama 3.1 8B Instant
    llama-3.1-8b-instant
    Free
  • Llama 3.3 70B Versatile
    llama-3.3-70b-versatile
    Free
  • GPT-OSS 120B (via Groq)
    openai/gpt-oss-120b
    Free
  • GPT-OSS 20B (via Groq)
    openai/gpt-oss-20b
    Free
  • Groq Compound
    groq/compound
xAIxAI

Grok 4 - code-focused and reasoning variants.

  • Grok Code Fast
    grok-code-fast-1
  • Grok 4.1 Fast (non-reasoning)
    grok-4-1-fast-non-reasoning
  • Grok 4.1 Fast (reasoning)
    grok-4-1-fast-reasoning
  • Grok 4.20
    grok-4.20
Moonshot

Kimi K2 - long-context reasoning.

  • Kimi K2.6
    kimi-k2.6

No key, no problem. Start a debate with zero setup - Consilium routes free-tier requests through Groq and OpenRouter automatically. Bring your own keys anytime for premium models.

What research backs multi-agent deliberation?

Consilium's deliberation approach is grounded in peer-reviewed research from ICML 2024 and AAAI 2024, which together report 8-15% improvements in factual accuracy and reasoning over single-model prompting.

Debating with More Persuasive LLMs Leads to More Truthful Answers

Akbir Khan et al. - ICML 2024

AI debate produces more truthful answers than single-model prompting, even when one debater argues for the wrong answer.

Improving Factuality and Reasoning via Multiagent Debate

Yilun Du et al. - ICML 2024

Multi-agent debate significantly improves factual accuracy and mathematical reasoning across multiple benchmarks.

LLM Discussion: Enhancing the Creativity of LLMs via Discussion Framework

Li et al. - AAAI 2024

Structured discussion between LLMs produces more creative and diverse outputs than individual generation.

Scalable AI Safety via Doubly-Efficient Debate

Irving et al. - AI Safety Research

Debate between AI systems provides a scalable mechanism for aligning AI behavior with human values.

How does pricing work? BYOK with zero markup.

Bring your own provider keys, pay your provider directly, and Consilium adds zero markup. Keys are AES-256-GCM encrypted at rest. Don't have a key? The Groq + OpenRouter free-tier pool covers up to 1,000 deliberations per month.

End-to-end encryptionBring Your Own KeysCLI + SDK