Structured Deliberation Between AI Models

Not another orchestration tool. Consilium makes AI models argue, challenge, and synthesize - producing answers with tracked confidence, dissent, and audit trails.

How It Works

A structured 6-phase deliberation process inspired by academic debate and jury systems.

1

Propose

Each model independently analyzes the problem and presents its initial position.

2

Challenge

Models cross-examine each other, probing assumptions and identifying weaknesses.

3

Rebut

Models refine their positions based on challenges, strengthening or revising arguments.

4

Evaluate

A judge model assesses argument quality, evidence strength, and logical consistency.

5

Vote

Models cast confidence-weighted votes on the strongest positions.

6

Synthesize

A final synthesis integrates the best arguments into a single, rigorous answer.

8 Deliberation Modes

Choose the right deliberation strategy for your use case.

Quick
~15s

Single round, fastest response. Best for simple questions needing a fast sanity check.

Council
~45s

Multi-round deliberation between models. The default mode for most decisions.

Deep
~90s

Extended deliberation with sub-agent research for complex, high-stakes questions.

Blind
~45s

Model names hidden until scored. Eliminates brand bias from evaluation.

Red Team
~120s

Adversarial assessment where models actively try to break each other's arguments.

Jury
~60s

Panel deliberation with structured voting. Models must reach consensus or declare dissent.

Market
~90s

Prediction market style confidence aggregation. Models stake credibility on positions.

Auto
~45s

Automatically selects the best deliberation mode based on topic complexity.

Why Deliberation > Orchestration

Orchestration runs models in parallel and picks the best. Deliberation makes them argue until the truth emerges.

CapabilityDeliberationOrchestration
Multiple model perspectives
Models challenge each other
Structured argumentation
Dissent tracking
Confidence-weighted voting
Adversarial red-teaming
Blind evaluation mode
Audit trail of reasoning
One command to get started

Install the CLI

Run debates from your terminal - pipe in files, diffs, or stdin and stream the deliberation live.

# 1. Install the CLI globally
npm install -g @myconsilium/cli

# 2. Sign in (or run on the free tier with no key)
consilium login

# 3. Run your first debate
consilium debate "What's the best way to ship this feature?" \
  --mode council

SDK Examples

Integrate deliberation into your stack in minutes.

from consilium import ConsiliumClient, DeliberationMode

client = ConsiliumClient(
    api_url="https://api.myconsilium.xyz",
    api_key="your-key",
)

result = client.deliberate(
    "Should we migrate to microservices?",
    mode=DeliberationMode.COUNCIL,
    models=["claude-sonnet-4-6",
            "gpt-5.4", "gemini-3-flash-preview"],
)

print(result.golden_prompt)
print(result.confidence_scores)
print(result.dissent_report)

Supported Providers

Bring your own API keys. Consilium works with all major LLM providers.

AnthropicAnthropic
OpenAIOpenAI
GoogleGoogle
GroqGroq
xAIxAI
MoonshotMoonshot
OpenRouterOpenRouter
Available in the CLI and Web app

Models on the Council

Mix any combination across providers. Models marked Free run on the no-key-required free tier.

AnthropicAnthropic

Claude 4 family - strongest reasoning and synthesis.

  • Claude Haiku 4.5
    claude-haiku-4-5-20251001
  • Claude Sonnet 4.6
    claude-sonnet-4-6
  • Claude Opus 4.6
    claude-opus-4-6
  • Claude Opus 4.7
    claude-opus-4-7
OpenAIOpenAI

GPT-5 series - fast, mini, and pro tiers.

  • GPT-5.4 Nano
    gpt-5.4-nano
  • GPT-5.4 Mini
    gpt-5.4-mini
  • GPT-5.4
    gpt-5.4
  • GPT-5.5
    gpt-5.5
  • GPT-5.5 Pro
    gpt-5.5-pro
GoogleGoogle

Gemini 3 - long context and fast multimodal.

  • Gemini 3.1 Flash-Lite
    gemini-3.1-flash-lite-preview
  • Gemini 3 Flash
    gemini-3-flash-preview
  • Gemini 3.1 Pro
    gemini-3.1-pro-preview
GroqGroq

Sub-second inference. Free tier available.

  • Llama 3.1 8B Instant
    llama-3.1-8b-instant
    Free
  • Llama 3.3 70B Versatile
    llama-3.3-70b-versatile
    Free
  • GPT-OSS 120B (via Groq)
    openai/gpt-oss-120b
    Free
  • GPT-OSS 20B (via Groq)
    openai/gpt-oss-20b
    Free
  • Groq Compound
    groq/compound
xAIxAI

Grok 4 - code-focused and reasoning variants.

  • Grok Code Fast
    grok-code-fast-1
  • Grok 4.1 Fast (non-reasoning)
    grok-4-1-fast-non-reasoning
  • Grok 4.1 Fast (reasoning)
    grok-4-1-fast-reasoning
  • Grok 4.20
    grok-4-20
Moonshot

Kimi K2 - long-context reasoning.

  • Kimi K2.6
    kimi-k2.6

No key, no problem. Start a debate with zero setup - Consilium routes free-tier requests through Groq and OpenRouter automatically. Bring your own keys anytime for premium models.

Research Backed

Consilium's deliberation approach is grounded in peer-reviewed research.

Debating with More Persuasive LLMs Leads to More Truthful Answers

Akbir Khan et al. - ICML 2024

AI debate produces more truthful answers than single-model prompting, even when one debater argues for the wrong answer.

Improving Factuality and Reasoning via Multiagent Debate

Yilun Du et al. - ICML 2024

Multi-agent debate significantly improves factual accuracy and mathematical reasoning across multiple benchmarks.

LLM Discussion: Enhancing the Creativity of LLMs via Discussion Framework

Li et al. - AAAI 2024

Structured discussion between LLMs produces more creative and diverse outputs than individual generation.

Scalable AI Safety via Doubly-Efficient Debate

Irving et al. - AI Safety Research

Debate between AI systems provides a scalable mechanism for aligning AI behavior with human values.

Your keys. Your control.

Bring your own provider keys and pay only for what you use.

End-to-end encryptionBring Your Own KeysCLI + SDK