Multi-AI Deliberation

Structured Disagreement Produces Better Decisions

Consilium implements formal argumentation protocols — proven in peer-reviewed research at ICML, ACL, and AAAI — where AI models propose, challenge, defend, and synthesize positions through adversarial debate.

Our Mission

We believe the best decisions emerge from structured disagreement. Consilium implements formal argumentation protocols — proven in peer-reviewed research — where AI models propose, challenge, defend, and synthesize through adversarial debate.

The result is consensus with tracked confidence, preserved dissent, and complete audit trails. Every conclusion is backed by evidence that survived adversarial scrutiny — not the output of a single model that was never challenged.

8
Deliberation Modes
5
LLM Providers
15
Models Supported
6
Vertical Templates
4
Voting Algorithms
3
Convergence Metrics
2
SDKs + CLI

What Makes Consilium Different

Six technical differentiators that separate deliberation from orchestration.

True Deliberation, Not Orchestration

Orchestration tools (CrewAI, AutoGen, LangGraph) run models in parallel and pick the best output. Consilium makes models argue, challenge claims, defend positions, vote, and only converge when mathematically confirmed. Cross-examination uses typed challenges (factual error, missing evidence, flawed logic) and categorized rebuttals (concede, refute, qualify, redirect). Each challenge must reference specific claims, and each rebuttal must provide evidence — not hand-waving.

Challenge types: FACTUAL_ERROR, MISSING_EVIDENCE, FLAWED_LOGIC. Rebuttal types: CONCEDE, REFUTE, QUALIFY, REDIRECT.
Formal Voting Theory

Condorcet method finds the candidate that beats ALL others pairwise. Borda count provides confidence-weighted scoring across all positions. Ranked Pairs delivers cycle-free tiebreaking using a directed acyclic graph of pairwise victories. Copeland scoring enables comparative analysis by counting net pairwise wins. This is real social choice theory applied to AI consensus — not majority voting, not picking the most popular answer.

Algorithms: Condorcet, Borda Count, Ranked Pairs, Copeland.
Mathematical Convergence Detection

Convergence is measured using Kendall tau correlation (0.4 weight) for ranking similarity, Jaccard index (0.35 weight) for proposal overlap, and concession tracking (0.25 weight) for position shifts. The composite score must reach 0.85 before consensus is declared. If convergence stalls, the system detects it and can trigger additional rounds or escalate to a different mode. Not vibes-based — mathematically verified.

Formula: 0.4 * kendall_tau + 0.35 * jaccard + 0.25 * concession_rate >= 0.85
Mandatory Dissent Preservation

Agglomerative clustering identifies minority positions across model responses by measuring semantic distance between position vectors. Every result includes both majority AND minority opinions. Healthcare, legal, and financial modes require explicit dissent reporting. No decision is declared unanimous unless mathematically verified through convergence metrics — and even then, the clustering algorithm surfaces the most distant position as a recorded dissent.

Clustering: agglomerative, distance-based. Output: majority position + all minority clusters.
Confidence Calibration

Models that change their claims under cross-examination pressure receive lower confidence scores. Calibration formula: stability * (1 - concession_rate) * (1 - 0.3 * qualification_rate). This measures explanation stability — do models hold firm on well-supported positions, or cave under scrutiny? Models that maintain their position with evidence get higher calibration; models that flip without justification get penalized.

Score = stability * (1 - concession_rate) * (1 - 0.3 * qualification_rate)
Complete Audit Trail

Every deliberation phase is recorded: input, output, tokens used, cost, and latency per model per round. Full transparency into how consensus was reached — which models agreed, who dissented, what challenges were raised, and how they were resolved. Token counts, cost breakdowns, and timing data enable cost optimization. Required for regulated industries like healthcare, finance, and legal.

Tracked per model: tokens_in, tokens_out, cost_usd, latency_ms, round, phase.

Our Story

Consilium started with a simple observation: when you ask one AI model a hard question, you get one perspective shaped by that model's training biases. Ask three models, and you get three perspectives — but no mechanism to resolve disagreements. We built that mechanism.

The breakthrough came from academic research on multi-agent debate. Papers from ICML 2024 showed that structured debate between LLMs improves factual accuracy by 8-15%, and that truth has a natural advantage in adversarial argumentation. We implemented these findings as a production platform with formal voting theory, convergence detection, and mandatory dissent preservation.

Consilium supports current-generation models across 7 providers: Anthropic (Claude Opus 4.7, Sonnet 4.6, Haiku 4.5), OpenAI (GPT-5.5 Pro, GPT-5.4), Google (Gemini 3.1 Pro, Gemini 3 Flash), xAI (Grok 4.20, Grok 4.1 Fast), Moonshot (Kimi K2.6), Groq for cost-effective inference (Llama 3.x, GPT-OSS, Compound), and OpenRouter for free-tier fallback. Models debate through a LangGraph state machine with typed challenges, categorized rebuttals, confidence-weighted voting, and mathematical convergence detection.

The architecture is a three-tier system: Next.js 15 frontend, NestJS 11 API with BullMQ job processing, and a FastAPI debate engine that orchestrates the deliberation state machine. Every phase is recorded for full auditability — which models agreed, who dissented, what evidence was cited, and how consensus was reached.

Architecture

Three-tier system with a LangGraph deliberation state machine.

Web (Next.js 15) → API (NestJS 11/Fastify) → Agents (FastAPI/Python)
                                                      ↓
                                             Debate Orchestrator
                                             ├── Round 1: Independent Analysis
                                             ├── Round 2: Cross-Examination
                                             ├── Round 3: Rebuttal & Refinement
                                             └── Judge: 5-Phase Synthesis

Voting: Condorcet → Borda Count → Ranked Pairs → Copeland
Convergence: Kendall τ + Jaccard + Concession Tracking (threshold: 0.85)
Dissent: Agglomerative Clustering → Minority Position Preservation
Founder

Meet the Founder

Why one developer is building the multi-AI council for everyone else.

Saad Kadri, Founder of Consilium
SK
Saad Kadri
Founder & Engineer

Hi, I'm Saad.

I build software for a living and got tired of the same pattern: ask one AI a hard question, get an answer that's almostright, lose two hours discovering the wrong half. The fix isn't a smarter single model — it's a room of models that argue, challenge each other, and only agree when they've really agreed. That's Consilium.

Mission

Make multi-AI deliberation the default for high-stakes engineering decisions. No more single-model guesses. No more provider lock-in. The council reads your code, debates the problem, and shows its work — so you can trust the answer or push back on it.

What I value

Provider neutrality
Seven providers, zero lock-in. BYOK or run on the free tier.
Codebase-aware
The council reads your files. No more guessing at structure or stack.
Show the work
Every claim is auditable. Dissent is preserved, not hidden.

Why I built Consilium

Every existing AI coding tool is a single model with a pretty wrapper. Cursor uses Claude. Copilot uses GPT. Gemini Code uses Gemini. Each one has blind spots, and pretending otherwise is how you ship subtly broken code.

Consilium puts seven of them in the same room — OpenAI, Anthropic, Google, Groq, xAI, Moonshot, OpenRouter — and makes them argue with each other on yourcodebase. When they disagree, you see the disagreement. When they converge, you know it's real, not a single model's preference. That's the tool I wanted, so I built it.

Built for teams

Bring your own provider keys and pay only for what you use. BYOK by default, encrypted at rest, with a full SDK and CLI story.

BYOKBring Your Own API Keys
Encrypted at restAES-256-GCM on every key
Free tierGroq models included at $0
TypeScript SDK
Python SDK
CLI
REST API
SSE Streaming