6 Vertical Templates

Use Cases

How teams use multi-agent deliberation to make better decisions. Each use case maps to a specific deliberation mode, evaluation rubric, and output format proven in peer-reviewed research.

Code Review

Red Team Mode

code_review template

3 models

2 rounds

SECURITY_VULN

LOGICAL_FLAW

EDGE_CASE

ROBUSTNESS_TEST

Three models independently review your code, each generating a comprehensive analysis of potential issues. Unlike traditional code review tools that run static analysis, Consilium's code review puts models into adversarial positions where they actively attack each other's findings, uncovering issues that surface only under cross-examination. The Red Team framework ensures that every vulnerability claim is stress-tested before reaching the final report.

During the Red Team phase, models issue typed challenges categorized as SECURITY_VULN, LOGICAL_FLAW, EDGE_CASE, or ROBUSTNESS_TEST. A defender model must rebut each challenge with evidence — conceding valid points, refuting false positives, qualifying edge cases, or redirecting to more critical issues. This adversarial dynamic mirrors real security audits where penetration testers and defenders engage in structured conflict to harden systems.

The judge model synthesizes all findings into a final vulnerability report with severity ratings (critical/high/medium/low), maps each finding to the original code location, and includes the defender's rebuttals. The result is a structured, auditable code review that catches 30-40% more issues than single-model review. Each finding is cross-referenced against OWASP Top 10, CWE identifiers, and SANS 25 categories where applicable.

Example Prompt

Review this authentication middleware for security vulnerabilities:

async function authMiddleware(req, res, next) {
  const token = req.headers.authorization;
  const decoded = jwt.verify(token, process.env.JWT_SECRET);
  req.user = await User.findById(decoded.id);
  next();
}

Output

Vulnerability report with severity ratings (critical/high/medium/low), defender rebuttals for each finding, judge's final assessment with prioritized remediation steps, and OWASP/CWE cross-references.

Why Deliberation Beats Single-Model

Single models miss 30-40% of security issues. Cross-examination forces models to justify their findings under adversarial pressure, eliminating false positives and surfacing hidden vulnerabilities that no single model catches alone. The Red Team structure ensures the defender cannot dismiss legitimate findings, while the attacker cannot inflate severity without evidence.

Evaluation Rubric

Security

30%

Correctness

25%

Performance

20%

Maintainability

15%

Style

10%

Criterion	Weight
Security	30%
Correctness	25%
Performance	20%
Maintainability	15%
Style	10%

Template Config

mode: "red_team"
template: "code_review"
models: 3
rounds: 2
require_dissent: true
require_citations: false

Research Synthesis

Council Mode

research_synthesis template

3 models

3 rounds

Models explore different perspectives on complex research topics, each bringing independent analysis of available evidence. The Council mode ensures diverse viewpoints are represented before any synthesis occurs, preventing the premature convergence that plagues single-model summarization. Three models deliberate across three rounds, with each round building on the previous one's findings and challenges.

During deliberation, models challenge each other's source interpretations, flag potential biases in cited research, and identify gaps in evidence coverage. Each claim must be backed by specific evidence, and models rate their confidence in each assertion. The confidence-weighted voting system (Condorcet + Borda count) ensures well-supported conclusions carry more weight than speculative claims.

The final synthesis includes a comprehensive overview with inline citations, a section of flagged uncertainties where models disagreed, and confidence scores for each major conclusion. Dissenting views are preserved — if one model identified contradictory evidence, that perspective is included alongside the majority position. The output distinguishes between strong consensus, weak consensus, and active disagreement.

Example Prompt

Synthesize current research on transformer architecture efficiency improvements, including sparse attention mechanisms, mixture of experts, and linear attention variants. Compare their tradeoffs for production deployment.

Output

Comprehensive synthesis with inline citations, flagged uncertainties with confidence intervals, per-conclusion confidence scores, and preserved minority opinions where models disagreed on evidence interpretation.

Why Deliberation Beats Single-Model

Multiple models reduce single-model hallucination and confirmation bias by up to 15%. When one model cites a finding, others verify it independently — catching fabricated citations and misrepresented conclusions that single-model approaches propagate unchecked. Three rounds of cross-examination force progressively deeper engagement with the evidence.

Evaluation Rubric

Accuracy

30%

Evidence Quality

25%

Completeness

20%

Bias Awareness

15%

Citation Quality

10%

Criterion	Weight
Accuracy	30%
Evidence Quality	25%
Completeness	20%
Bias Awareness	15%
Citation Quality	10%

Template Config

mode: "council"
template: "research_synthesis"
models: 3
rounds: 3
require_dissent: true
require_citations: false

Risk Assessment

Jury Mode

risk_assessment template

5 models

3 rounds

Five models participate in a structured Jury deliberation with MANDATORY_DISSENT reporting across three rounds. Every risk assessment must include minority opinions — no conclusion is presented as unanimous unless mathematically verified through convergence detection (Kendall tau + Jaccard + concession rate >= 0.85). This prevents the groupthink that makes single-model risk assessments dangerously overconfident.

Each model independently identifies risks, assesses likelihood and impact on standardized scales, and proposes mitigation strategies. During deliberation, models challenge each other's likelihood estimates and impact assessments, forcing quantitative justification. A model claiming 'low probability' must defend that assessment against adversarial questioning from four other models across three rounds.

The output is a structured risk matrix with likelihood/impact ratings for each identified risk, detailed mitigation strategies with implementation timelines, and mandatory minority opinions. If even one model identifies a catastrophic risk that others dismiss, that dissent is prominently featured in the final report rather than averaged away. Agglomerative clustering groups related risks and surfaces overlooked tail risks.

Example Prompt

Assess risks of migrating from AWS to multi-cloud architecture (AWS + GCP + Azure). Consider operational complexity, data sovereignty, cost implications, team skill gaps, vendor lock-in tradeoffs, and disaster recovery scenarios.

Output

Risk matrix with likelihood/impact ratings, mitigation strategies with implementation order, mandatory minority opinions from all five models, and a dissent report highlighting risks that only some models identified.

Why Deliberation Beats Single-Model

MANDATORY_DISSENT ensures no risks are overlooked due to groupthink. In single-model assessments, the model's training biases determine which risks are emphasized. Five-model Jury deliberation with forced dissent surfaces the full risk landscape — including tail risks that any individual model would dismiss as unlikely.

Evaluation Rubric

Risk Identification

25%

Likelihood Assessment

20%

Impact Analysis

20%

Mitigation Quality

20%

Compliance

15%

Criterion	Weight
Risk Identification	25%
Likelihood Assessment	20%
Impact Analysis	20%
Mitigation Quality	20%
Compliance	15%

Template Config

mode: "jury"
template: "risk_assessment"
models: 5
rounds: 3
require_dissent: true # MANDATORY
require_citations: false
mandatory_dissent: true

Healthcare Decision Support

Council Mode

healthcare template

3 models

3 rounds

Healthcare deliberations enforce REQUIRE_DISSENT and REQUIRE_CITATIONS as non-negotiable constraints. Every diagnostic suggestion must cite specific clinical evidence, and every differential diagnosis must include dissenting opinions. This reflects the medical principle that premature diagnostic closure is the leading cause of diagnostic error — a problem that single-model systems systematically amplify.

Models independently evaluate patient presentations, each generating a ranked differential diagnosis with supporting evidence across three rounds of deliberation. During cross-examination, models challenge each other's diagnostic reasoning — questioning whether symptoms truly support a proposed diagnosis, flagging overlooked conditions, and identifying potential drug interactions or contraindications that any single model might miss.

The output includes a ranked differential diagnosis list with evidence chains for each condition, safety flags for critical findings that require immediate action, and explicit dissenting opinions where models disagreed on diagnosis likelihood. Every recommendation includes a confidence score calibrated by how well it withstood cross-examination — models that changed their diagnosis under pressure receive lower calibration scores.

Example Prompt

Evaluate differential diagnosis for a 45-year-old patient presenting with acute onset chest pain radiating to the left arm, diaphoresis, elevated troponin, but normal ECG. Consider cardiac, pulmonary, and gastrointestinal etiologies.

Output

Ranked differential diagnosis with evidence chains for each condition, safety flags for critical findings requiring immediate action, dissenting opinions on diagnosis likelihood, and confidence scores calibrated by cross-examination resilience.

Why Deliberation Beats Single-Model

Safety-critical decisions need transparent disagreement and evidence chains. A single model might miss a rare but life-threatening diagnosis. REQUIRE_DISSENT ensures uncommon conditions are considered, and REQUIRE_CITATIONS prevents hallucinated medical guidance. Three rounds of deliberation force models to defend their diagnostic reasoning under adversarial scrutiny.

Evaluation Rubric

Evidence Quality

30%

Diagnostic Accuracy

25%

Safety Considerations

20%

Completeness

15%

Actionability

10%

Criterion	Weight
Evidence Quality	30%
Diagnostic Accuracy	25%
Safety Considerations	20%
Completeness	15%
Actionability	10%

Template Config

mode: "council"
template: "healthcare"
models: 3
rounds: 3
require_dissent: true # MANDATORY
require_citations: true # REQUIRED

Legal Analysis

Blind Mode

legal template

2 models

3 rounds

Legal analysis uses Blind mode with a dialectical structure: one model argues risk, another argues acceptability, and evaluation happens without knowledge of which model produced which argument. This eliminates the brand bias where evaluators unconsciously favor responses from models they perceive as more authoritative. MANDATORY_DISSENT ensures both conservative and permissive legal interpretations are fully explored across three rounds.

The dialectical format ensures both sides of every legal question are thoroughly explored. The risk-arguing model must identify every potential compliance gap, liability exposure, and regulatory risk. The acceptability-arguing model must demonstrate why current language or practices are legally defensible. Neither model knows the other's position during initial analysis, and the judge evaluates arguments in multiple orderings to prevent position bias.

The blind judge evaluates arguments purely on legal merit, producing clause-by-clause risk ratings, regulatory gap analysis, and recommended revisions with alternative language. The final output includes a dissent report showing where the risk and acceptability models fundamentally disagreed, ensuring stakeholders see the full spectrum of legal opinion rather than a false consensus that masks genuine legal ambiguity.

Example Prompt

Review this SaaS terms of service for GDPR compliance risks. Evaluate data processing clauses, cross-border transfer mechanisms, data subject rights implementation, and breach notification procedures against current EU regulatory requirements.

Output

Clause-by-clause risk ratings (high/medium/low), regulatory gaps mapped to specific GDPR articles, recommended revisions with alternative language, and a dissent report showing where risk and acceptability models disagreed.

Why Deliberation Beats Single-Model

Blind evaluation eliminates model bias — the judge cannot favor a 'brand name' model's analysis. The dialectical format with MANDATORY_DISSENT ensures both conservative and permissive legal interpretations are explored across three rounds, giving stakeholders the full picture rather than a single model's risk tolerance.

Evaluation Rubric

Legal Accuracy

30%

Risk Identification

25%

Regulatory Compliance

20%

Practicality

15%

Clarity

10%

Criterion	Weight
Legal Accuracy	30%
Risk Identification	25%
Regulatory Compliance	20%
Practicality	15%
Clarity	10%

Template Config

mode: "blind"
template: "legal"
models: 2
rounds: 3
require_dissent: true # MANDATORY
require_citations: true # REQUIRED
mandatory_dissent: true

Financial Analysis

Jury Mode

finance template

3 models

3 rounds

VaR

CVaR

Sharpe

Basel III

SOX

Dodd-Frank

MiFID II

Financial analysis uses Jury mode with MANDATORY_DISSENT and requires quantitative metrics in every assessment. Three models must provide specific numerical analysis — VaR (Value at Risk), CVaR (Conditional Value at Risk), and Sharpe ratios — rather than qualitative hand-waving. Compliance mapping covers Basel III, SOX, Dodd-Frank, and MiFID II frameworks. Every quantitative claim is stress-tested across three rounds of deliberation.

During deliberation, models challenge each other's quantitative assumptions. If one model projects 12% returns, another must stress-test that assumption against historical drawdown scenarios, current market volatility, and macroeconomic indicators. Scenario analysis is mandatory: bull case, base case, bear case, and black swan scenarios must all be addressed with specific numerical projections and probability-weighted outcomes.

The output includes a comprehensive risk assessment with VaR/CVaR/Sharpe metrics, stress test results across multiple scenarios, regulatory compliance mapping against Basel III, SOX, Dodd-Frank, and MiFID II frameworks, hedging recommendations, and mandatory dissent. If one model identifies a systemic risk that others dismiss, that dissent is preserved with full quantitative backing — preventing the consensus bias that contributed to historical financial crises.

Example Prompt

Evaluate the risk profile of this investment portfolio under current market conditions: 40% US large-cap equities, 20% international developed markets, 15% emerging markets, 15% investment-grade bonds, 10% REITs. Consider interest rate sensitivity, geopolitical risk, and liquidity constraints.

Output

Risk assessment with VaR/CVaR/Sharpe metrics, stress test results across bull/base/bear/black-swan scenarios, Basel III/SOX/Dodd-Frank/MiFID II compliance mapping, hedging recommendations, and mandatory dissent on risk factors where models disagreed.

Why Deliberation Beats Single-Model

Jury format with MANDATORY_DISSENT prevents consensus bias in financial decisions. Single models tend to anchor on base-case scenarios. Three-model deliberation with forced dissent ensures tail risks and contrarian indicators are quantified and preserved in the final analysis — the kind of minority opinion that gets averaged away in traditional risk committees.

Evaluation Rubric

Quantitative Rigor

30%

Regulatory Alignment

25%

Risk Coverage

20%

Scenario Analysis

15%

Actionability

10%

Criterion	Weight
Quantitative Rigor	30%
Regulatory Alignment	25%
Risk Coverage	20%
Scenario Analysis	15%
Actionability	10%

Template Config

mode: "jury"
template: "finance"
models: 3
rounds: 3
require_dissent: true # MANDATORY
require_citations: false
mandatory_dissent: true
metrics: [VaR, CVaR, Sharpe]
compliance: [Basel III, SOX, Dodd-Frank, MiFID II]