Use Cases
How teams use multi-agent deliberation to make better decisions. Each use case maps to a specific deliberation mode, evaluation rubric, and output format proven in peer-reviewed research.
Code Review
Three models independently review your code, each generating a comprehensive analysis of potential issues. Unlike traditional code review tools that run static analysis, Consilium's code review puts models into adversarial positions where they actively attack each other's findings, uncovering issues that surface only under cross-examination. The Red Team framework ensures that every vulnerability claim is stress-tested before reaching the final report.
During the Red Team phase, models issue typed challenges categorized as SECURITY_VULN, LOGICAL_FLAW, EDGE_CASE, or ROBUSTNESS_TEST. A defender model must rebut each challenge with evidence — conceding valid points, refuting false positives, qualifying edge cases, or redirecting to more critical issues. This adversarial dynamic mirrors real security audits where penetration testers and defenders engage in structured conflict to harden systems.
The judge model synthesizes all findings into a final vulnerability report with severity ratings (critical/high/medium/low), maps each finding to the original code location, and includes the defender's rebuttals. The result is a structured, auditable code review that catches 30-40% more issues than single-model review. Each finding is cross-referenced against OWASP Top 10, CWE identifiers, and SANS 25 categories where applicable.
Example Prompt
Review this authentication middleware for security vulnerabilities:
async function authMiddleware(req, res, next) {
const token = req.headers.authorization;
const decoded = jwt.verify(token, process.env.JWT_SECRET);
req.user = await User.findById(decoded.id);
next();
}Output
Vulnerability report with severity ratings (critical/high/medium/low), defender rebuttals for each finding, judge's final assessment with prioritized remediation steps, and OWASP/CWE cross-references.
Why Deliberation Beats Single-Model
Single models miss 30-40% of security issues. Cross-examination forces models to justify their findings under adversarial pressure, eliminating false positives and surfacing hidden vulnerabilities that no single model catches alone. The Red Team structure ensures the defender cannot dismiss legitimate findings, while the attacker cannot inflate severity without evidence.
| Criterion | Weight |
|---|---|
| Security | 30% |
| Correctness | 25% |
| Performance | 20% |
| Maintainability | 15% |
| Style | 10% |
mode: "red_team"
template: "code_review"
models: 3
rounds: 2
require_dissent: true
require_citations: falseResearch Synthesis
Models explore different perspectives on complex research topics, each bringing independent analysis of available evidence. The Council mode ensures diverse viewpoints are represented before any synthesis occurs, preventing the premature convergence that plagues single-model summarization. Three models deliberate across three rounds, with each round building on the previous one's findings and challenges.
During deliberation, models challenge each other's source interpretations, flag potential biases in cited research, and identify gaps in evidence coverage. Each claim must be backed by specific evidence, and models rate their confidence in each assertion. The confidence-weighted voting system (Condorcet + Borda count) ensures well-supported conclusions carry more weight than speculative claims.
The final synthesis includes a comprehensive overview with inline citations, a section of flagged uncertainties where models disagreed, and confidence scores for each major conclusion. Dissenting views are preserved — if one model identified contradictory evidence, that perspective is included alongside the majority position. The output distinguishes between strong consensus, weak consensus, and active disagreement.
Example Prompt
Synthesize current research on transformer architecture efficiency improvements, including sparse attention mechanisms, mixture of experts, and linear attention variants. Compare their tradeoffs for production deployment.Output
Comprehensive synthesis with inline citations, flagged uncertainties with confidence intervals, per-conclusion confidence scores, and preserved minority opinions where models disagreed on evidence interpretation.
Why Deliberation Beats Single-Model
Multiple models reduce single-model hallucination and confirmation bias by up to 15%. When one model cites a finding, others verify it independently — catching fabricated citations and misrepresented conclusions that single-model approaches propagate unchecked. Three rounds of cross-examination force progressively deeper engagement with the evidence.
| Criterion | Weight |
|---|---|
| Accuracy | 30% |
| Evidence Quality | 25% |
| Completeness | 20% |
| Bias Awareness | 15% |
| Citation Quality | 10% |
mode: "council"
template: "research_synthesis"
models: 3
rounds: 3
require_dissent: true
require_citations: falseRisk Assessment
Five models participate in a structured Jury deliberation with MANDATORY_DISSENT reporting across three rounds. Every risk assessment must include minority opinions — no conclusion is presented as unanimous unless mathematically verified through convergence detection (Kendall tau + Jaccard + concession rate >= 0.85). This prevents the groupthink that makes single-model risk assessments dangerously overconfident.
Each model independently identifies risks, assesses likelihood and impact on standardized scales, and proposes mitigation strategies. During deliberation, models challenge each other's likelihood estimates and impact assessments, forcing quantitative justification. A model claiming 'low probability' must defend that assessment against adversarial questioning from four other models across three rounds.
The output is a structured risk matrix with likelihood/impact ratings for each identified risk, detailed mitigation strategies with implementation timelines, and mandatory minority opinions. If even one model identifies a catastrophic risk that others dismiss, that dissent is prominently featured in the final report rather than averaged away. Agglomerative clustering groups related risks and surfaces overlooked tail risks.
Example Prompt
Assess risks of migrating from AWS to multi-cloud architecture (AWS + GCP + Azure). Consider operational complexity, data sovereignty, cost implications, team skill gaps, vendor lock-in tradeoffs, and disaster recovery scenarios.Output
Risk matrix with likelihood/impact ratings, mitigation strategies with implementation order, mandatory minority opinions from all five models, and a dissent report highlighting risks that only some models identified.
Why Deliberation Beats Single-Model
MANDATORY_DISSENT ensures no risks are overlooked due to groupthink. In single-model assessments, the model's training biases determine which risks are emphasized. Five-model Jury deliberation with forced dissent surfaces the full risk landscape — including tail risks that any individual model would dismiss as unlikely.
| Criterion | Weight |
|---|---|
| Risk Identification | 25% |
| Likelihood Assessment | 20% |
| Impact Analysis | 20% |
| Mitigation Quality | 20% |
| Compliance | 15% |
mode: "jury"
template: "risk_assessment"
models: 5
rounds: 3
require_dissent: true # MANDATORY
require_citations: false
mandatory_dissent: trueHealthcare Decision Support
Healthcare deliberations enforce REQUIRE_DISSENT and REQUIRE_CITATIONS as non-negotiable constraints. Every diagnostic suggestion must cite specific clinical evidence, and every differential diagnosis must include dissenting opinions. This reflects the medical principle that premature diagnostic closure is the leading cause of diagnostic error — a problem that single-model systems systematically amplify.
Models independently evaluate patient presentations, each generating a ranked differential diagnosis with supporting evidence across three rounds of deliberation. During cross-examination, models challenge each other's diagnostic reasoning — questioning whether symptoms truly support a proposed diagnosis, flagging overlooked conditions, and identifying potential drug interactions or contraindications that any single model might miss.
The output includes a ranked differential diagnosis list with evidence chains for each condition, safety flags for critical findings that require immediate action, and explicit dissenting opinions where models disagreed on diagnosis likelihood. Every recommendation includes a confidence score calibrated by how well it withstood cross-examination — models that changed their diagnosis under pressure receive lower calibration scores.
Example Prompt
Evaluate differential diagnosis for a 45-year-old patient presenting with acute onset chest pain radiating to the left arm, diaphoresis, elevated troponin, but normal ECG. Consider cardiac, pulmonary, and gastrointestinal etiologies.Output
Ranked differential diagnosis with evidence chains for each condition, safety flags for critical findings requiring immediate action, dissenting opinions on diagnosis likelihood, and confidence scores calibrated by cross-examination resilience.
Why Deliberation Beats Single-Model
Safety-critical decisions need transparent disagreement and evidence chains. A single model might miss a rare but life-threatening diagnosis. REQUIRE_DISSENT ensures uncommon conditions are considered, and REQUIRE_CITATIONS prevents hallucinated medical guidance. Three rounds of deliberation force models to defend their diagnostic reasoning under adversarial scrutiny.
| Criterion | Weight |
|---|---|
| Evidence Quality | 30% |
| Diagnostic Accuracy | 25% |
| Safety Considerations | 20% |
| Completeness | 15% |
| Actionability | 10% |
mode: "council"
template: "healthcare"
models: 3
rounds: 3
require_dissent: true # MANDATORY
require_citations: true # REQUIREDLegal Analysis
Legal analysis uses Blind mode with a dialectical structure: one model argues risk, another argues acceptability, and evaluation happens without knowledge of which model produced which argument. This eliminates the brand bias where evaluators unconsciously favor responses from models they perceive as more authoritative. MANDATORY_DISSENT ensures both conservative and permissive legal interpretations are fully explored across three rounds.
The dialectical format ensures both sides of every legal question are thoroughly explored. The risk-arguing model must identify every potential compliance gap, liability exposure, and regulatory risk. The acceptability-arguing model must demonstrate why current language or practices are legally defensible. Neither model knows the other's position during initial analysis, and the judge evaluates arguments in multiple orderings to prevent position bias.
The blind judge evaluates arguments purely on legal merit, producing clause-by-clause risk ratings, regulatory gap analysis, and recommended revisions with alternative language. The final output includes a dissent report showing where the risk and acceptability models fundamentally disagreed, ensuring stakeholders see the full spectrum of legal opinion rather than a false consensus that masks genuine legal ambiguity.
Example Prompt
Review this SaaS terms of service for GDPR compliance risks. Evaluate data processing clauses, cross-border transfer mechanisms, data subject rights implementation, and breach notification procedures against current EU regulatory requirements.Output
Clause-by-clause risk ratings (high/medium/low), regulatory gaps mapped to specific GDPR articles, recommended revisions with alternative language, and a dissent report showing where risk and acceptability models disagreed.
Why Deliberation Beats Single-Model
Blind evaluation eliminates model bias — the judge cannot favor a 'brand name' model's analysis. The dialectical format with MANDATORY_DISSENT ensures both conservative and permissive legal interpretations are explored across three rounds, giving stakeholders the full picture rather than a single model's risk tolerance.
| Criterion | Weight |
|---|---|
| Legal Accuracy | 30% |
| Risk Identification | 25% |
| Regulatory Compliance | 20% |
| Practicality | 15% |
| Clarity | 10% |
mode: "blind"
template: "legal"
models: 2
rounds: 3
require_dissent: true # MANDATORY
require_citations: true # REQUIRED
mandatory_dissent: trueFinancial Analysis
Financial analysis uses Jury mode with MANDATORY_DISSENT and requires quantitative metrics in every assessment. Three models must provide specific numerical analysis — VaR (Value at Risk), CVaR (Conditional Value at Risk), and Sharpe ratios — rather than qualitative hand-waving. Compliance mapping covers Basel III, SOX, Dodd-Frank, and MiFID II frameworks. Every quantitative claim is stress-tested across three rounds of deliberation.
During deliberation, models challenge each other's quantitative assumptions. If one model projects 12% returns, another must stress-test that assumption against historical drawdown scenarios, current market volatility, and macroeconomic indicators. Scenario analysis is mandatory: bull case, base case, bear case, and black swan scenarios must all be addressed with specific numerical projections and probability-weighted outcomes.
The output includes a comprehensive risk assessment with VaR/CVaR/Sharpe metrics, stress test results across multiple scenarios, regulatory compliance mapping against Basel III, SOX, Dodd-Frank, and MiFID II frameworks, hedging recommendations, and mandatory dissent. If one model identifies a systemic risk that others dismiss, that dissent is preserved with full quantitative backing — preventing the consensus bias that contributed to historical financial crises.
Example Prompt
Evaluate the risk profile of this investment portfolio under current market conditions: 40% US large-cap equities, 20% international developed markets, 15% emerging markets, 15% investment-grade bonds, 10% REITs. Consider interest rate sensitivity, geopolitical risk, and liquidity constraints.Output
Risk assessment with VaR/CVaR/Sharpe metrics, stress test results across bull/base/bear/black-swan scenarios, Basel III/SOX/Dodd-Frank/MiFID II compliance mapping, hedging recommendations, and mandatory dissent on risk factors where models disagreed.
Why Deliberation Beats Single-Model
Jury format with MANDATORY_DISSENT prevents consensus bias in financial decisions. Single models tend to anchor on base-case scenarios. Three-model deliberation with forced dissent ensures tail risks and contrarian indicators are quantified and preserved in the final analysis — the kind of minority opinion that gets averaged away in traditional risk committees.
| Criterion | Weight |
|---|---|
| Quantitative Rigor | 30% |
| Regulatory Alignment | 25% |
| Risk Coverage | 20% |
| Scenario Analysis | 15% |
| Actionability | 10% |
mode: "jury"
template: "finance"
models: 3
rounds: 3
require_dissent: true # MANDATORY
require_citations: false
mandatory_dissent: true
metrics: [VaR, CVaR, Sharpe]
compliance: [Basel III, SOX, Dodd-Frank, MiFID II]