6 Vertical Templates

Use Cases

How teams use multi-agent deliberation to make better decisions. Each use case maps to a specific deliberation mode, evaluation rubric, and output format proven in peer-reviewed research.

1

Code Review

Red Team Mode
code_review template
3 models
2 rounds
SECURITY_VULN
LOGICAL_FLAW
EDGE_CASE
ROBUSTNESS_TEST

Three models independently review your code, each generating a comprehensive analysis of potential issues. Unlike traditional code review tools that run static analysis, Consilium's code review puts models into adversarial positions where they actively attack each other's findings, uncovering issues that surface only under cross-examination. The Red Team framework ensures that every vulnerability claim is stress-tested before reaching the final report.

During the Red Team phase, models issue typed challenges categorized as SECURITY_VULN, LOGICAL_FLAW, EDGE_CASE, or ROBUSTNESS_TEST. A defender model must rebut each challenge with evidence — conceding valid points, refuting false positives, qualifying edge cases, or redirecting to more critical issues. This adversarial dynamic mirrors real security audits where penetration testers and defenders engage in structured conflict to harden systems.

The judge model synthesizes all findings into a final vulnerability report with severity ratings (critical/high/medium/low), maps each finding to the original code location, and includes the defender's rebuttals. The result is a structured, auditable code review that catches 30-40% more issues than single-model review. Each finding is cross-referenced against OWASP Top 10, CWE identifiers, and SANS 25 categories where applicable.

Example Prompt

Review this authentication middleware for security vulnerabilities:

async function authMiddleware(req, res, next) {
  const token = req.headers.authorization;
  const decoded = jwt.verify(token, process.env.JWT_SECRET);
  req.user = await User.findById(decoded.id);
  next();
}

Output

Vulnerability report with severity ratings (critical/high/medium/low), defender rebuttals for each finding, judge's final assessment with prioritized remediation steps, and OWASP/CWE cross-references.

Why Deliberation Beats Single-Model

Single models miss 30-40% of security issues. Cross-examination forces models to justify their findings under adversarial pressure, eliminating false positives and surfacing hidden vulnerabilities that no single model catches alone. The Red Team structure ensures the defender cannot dismiss legitimate findings, while the attacker cannot inflate severity without evidence.

Evaluation Rubric
Security
30%
Correctness
25%
Performance
20%
Maintainability
15%
Style
10%
CriterionWeight
Security30%
Correctness25%
Performance20%
Maintainability15%
Style10%
Template Config
mode: "red_team"
template: "code_review"
models: 3
rounds: 2
require_dissent: true
require_citations: false
2

Research Synthesis

Council Mode
research_synthesis template
3 models
3 rounds

Models explore different perspectives on complex research topics, each bringing independent analysis of available evidence. The Council mode ensures diverse viewpoints are represented before any synthesis occurs, preventing the premature convergence that plagues single-model summarization. Three models deliberate across three rounds, with each round building on the previous one's findings and challenges.

During deliberation, models challenge each other's source interpretations, flag potential biases in cited research, and identify gaps in evidence coverage. Each claim must be backed by specific evidence, and models rate their confidence in each assertion. The confidence-weighted voting system (Condorcet + Borda count) ensures well-supported conclusions carry more weight than speculative claims.

The final synthesis includes a comprehensive overview with inline citations, a section of flagged uncertainties where models disagreed, and confidence scores for each major conclusion. Dissenting views are preserved — if one model identified contradictory evidence, that perspective is included alongside the majority position. The output distinguishes between strong consensus, weak consensus, and active disagreement.

Example Prompt

Synthesize current research on transformer architecture efficiency improvements, including sparse attention mechanisms, mixture of experts, and linear attention variants. Compare their tradeoffs for production deployment.

Output

Comprehensive synthesis with inline citations, flagged uncertainties with confidence intervals, per-conclusion confidence scores, and preserved minority opinions where models disagreed on evidence interpretation.

Why Deliberation Beats Single-Model

Multiple models reduce single-model hallucination and confirmation bias by up to 15%. When one model cites a finding, others verify it independently — catching fabricated citations and misrepresented conclusions that single-model approaches propagate unchecked. Three rounds of cross-examination force progressively deeper engagement with the evidence.

Evaluation Rubric
Accuracy
30%
Evidence Quality
25%
Completeness
20%
Bias Awareness
15%
Citation Quality
10%
CriterionWeight
Accuracy30%
Evidence Quality25%
Completeness20%
Bias Awareness15%
Citation Quality10%
Template Config
mode: "council"
template: "research_synthesis"
models: 3
rounds: 3
require_dissent: true
require_citations: false
3

Risk Assessment

Jury Mode
risk_assessment template
5 models
3 rounds

Five models participate in a structured Jury deliberation with MANDATORY_DISSENT reporting across three rounds. Every risk assessment must include minority opinions — no conclusion is presented as unanimous unless mathematically verified through convergence detection (Kendall tau + Jaccard + concession rate >= 0.85). This prevents the groupthink that makes single-model risk assessments dangerously overconfident.

Each model independently identifies risks, assesses likelihood and impact on standardized scales, and proposes mitigation strategies. During deliberation, models challenge each other's likelihood estimates and impact assessments, forcing quantitative justification. A model claiming 'low probability' must defend that assessment against adversarial questioning from four other models across three rounds.

The output is a structured risk matrix with likelihood/impact ratings for each identified risk, detailed mitigation strategies with implementation timelines, and mandatory minority opinions. If even one model identifies a catastrophic risk that others dismiss, that dissent is prominently featured in the final report rather than averaged away. Agglomerative clustering groups related risks and surfaces overlooked tail risks.

Example Prompt

Assess risks of migrating from AWS to multi-cloud architecture (AWS + GCP + Azure). Consider operational complexity, data sovereignty, cost implications, team skill gaps, vendor lock-in tradeoffs, and disaster recovery scenarios.

Output

Risk matrix with likelihood/impact ratings, mitigation strategies with implementation order, mandatory minority opinions from all five models, and a dissent report highlighting risks that only some models identified.

Why Deliberation Beats Single-Model

MANDATORY_DISSENT ensures no risks are overlooked due to groupthink. In single-model assessments, the model's training biases determine which risks are emphasized. Five-model Jury deliberation with forced dissent surfaces the full risk landscape — including tail risks that any individual model would dismiss as unlikely.

Evaluation Rubric
Risk Identification
25%
Likelihood Assessment
20%
Impact Analysis
20%
Mitigation Quality
20%
Compliance
15%
CriterionWeight
Risk Identification25%
Likelihood Assessment20%
Impact Analysis20%
Mitigation Quality20%
Compliance15%
Template Config
mode: "jury"
template: "risk_assessment"
models: 5
rounds: 3
require_dissent: true # MANDATORY
require_citations: false
mandatory_dissent: true
4

Healthcare Decision Support

Council Mode
healthcare template
3 models
3 rounds

Healthcare deliberations enforce REQUIRE_DISSENT and REQUIRE_CITATIONS as non-negotiable constraints. Every diagnostic suggestion must cite specific clinical evidence, and every differential diagnosis must include dissenting opinions. This reflects the medical principle that premature diagnostic closure is the leading cause of diagnostic error — a problem that single-model systems systematically amplify.

Models independently evaluate patient presentations, each generating a ranked differential diagnosis with supporting evidence across three rounds of deliberation. During cross-examination, models challenge each other's diagnostic reasoning — questioning whether symptoms truly support a proposed diagnosis, flagging overlooked conditions, and identifying potential drug interactions or contraindications that any single model might miss.

The output includes a ranked differential diagnosis list with evidence chains for each condition, safety flags for critical findings that require immediate action, and explicit dissenting opinions where models disagreed on diagnosis likelihood. Every recommendation includes a confidence score calibrated by how well it withstood cross-examination — models that changed their diagnosis under pressure receive lower calibration scores.

Example Prompt

Evaluate differential diagnosis for a 45-year-old patient presenting with acute onset chest pain radiating to the left arm, diaphoresis, elevated troponin, but normal ECG. Consider cardiac, pulmonary, and gastrointestinal etiologies.

Output

Ranked differential diagnosis with evidence chains for each condition, safety flags for critical findings requiring immediate action, dissenting opinions on diagnosis likelihood, and confidence scores calibrated by cross-examination resilience.

Why Deliberation Beats Single-Model

Safety-critical decisions need transparent disagreement and evidence chains. A single model might miss a rare but life-threatening diagnosis. REQUIRE_DISSENT ensures uncommon conditions are considered, and REQUIRE_CITATIONS prevents hallucinated medical guidance. Three rounds of deliberation force models to defend their diagnostic reasoning under adversarial scrutiny.

Evaluation Rubric
Evidence Quality
30%
Diagnostic Accuracy
25%
Safety Considerations
20%
Completeness
15%
Actionability
10%
CriterionWeight
Evidence Quality30%
Diagnostic Accuracy25%
Safety Considerations20%
Completeness15%
Actionability10%
Template Config
mode: "council"
template: "healthcare"
models: 3
rounds: 3
require_dissent: true # MANDATORY
require_citations: true # REQUIRED
6

Financial Analysis

Jury Mode
finance template
3 models
3 rounds
VaR
CVaR
Sharpe
Basel III
SOX
Dodd-Frank
MiFID II

Financial analysis uses Jury mode with MANDATORY_DISSENT and requires quantitative metrics in every assessment. Three models must provide specific numerical analysis — VaR (Value at Risk), CVaR (Conditional Value at Risk), and Sharpe ratios — rather than qualitative hand-waving. Compliance mapping covers Basel III, SOX, Dodd-Frank, and MiFID II frameworks. Every quantitative claim is stress-tested across three rounds of deliberation.

During deliberation, models challenge each other's quantitative assumptions. If one model projects 12% returns, another must stress-test that assumption against historical drawdown scenarios, current market volatility, and macroeconomic indicators. Scenario analysis is mandatory: bull case, base case, bear case, and black swan scenarios must all be addressed with specific numerical projections and probability-weighted outcomes.

The output includes a comprehensive risk assessment with VaR/CVaR/Sharpe metrics, stress test results across multiple scenarios, regulatory compliance mapping against Basel III, SOX, Dodd-Frank, and MiFID II frameworks, hedging recommendations, and mandatory dissent. If one model identifies a systemic risk that others dismiss, that dissent is preserved with full quantitative backing — preventing the consensus bias that contributed to historical financial crises.

Example Prompt

Evaluate the risk profile of this investment portfolio under current market conditions: 40% US large-cap equities, 20% international developed markets, 15% emerging markets, 15% investment-grade bonds, 10% REITs. Consider interest rate sensitivity, geopolitical risk, and liquidity constraints.

Output

Risk assessment with VaR/CVaR/Sharpe metrics, stress test results across bull/base/bear/black-swan scenarios, Basel III/SOX/Dodd-Frank/MiFID II compliance mapping, hedging recommendations, and mandatory dissent on risk factors where models disagreed.

Why Deliberation Beats Single-Model

Jury format with MANDATORY_DISSENT prevents consensus bias in financial decisions. Single models tend to anchor on base-case scenarios. Three-model deliberation with forced dissent ensures tail risks and contrarian indicators are quantified and preserved in the final analysis — the kind of minority opinion that gets averaged away in traditional risk committees.

Evaluation Rubric
Quantitative Rigor
30%
Regulatory Alignment
25%
Risk Coverage
20%
Scenario Analysis
15%
Actionability
10%
CriterionWeight
Quantitative Rigor30%
Regulatory Alignment25%
Risk Coverage20%
Scenario Analysis15%
Actionability10%
Template Config
mode: "jury"
template: "finance"
models: 3
rounds: 3
require_dissent: true # MANDATORY
require_citations: false
mandatory_dissent: true
metrics: [VaR, CVaR, Sharpe]
compliance: [Basel III, SOX, Dodd-Frank, MiFID II]