Vertical Templates
6 pre-configured deliberation templates optimized for specific domains. Each template defines the mode, rubric weights, system prompts, and evaluation criteria.
Overview
| Template | Mode | Models | Rounds |
|---|---|---|---|
| Code Review | Red Team | 3 | 2 |
| Research Synthesis | Council | 3 | 3 |
| Risk Assessment | Jury | 5 | 3 |
| Healthcare Diagnostics | Council | 3 | 3 |
| Legal Review (Dialectical) | Blind | 2 | 3 |
| Finance Risk Assessment | Jury | 3 | 3 |
Three models independently review code, then adversarially attack each other's findings. Attackers probe for security vulnerabilities, logical flaws, edge cases, and robustness issues. Defenders respond to each attack. A judge evaluates the validity of attacks and strength of defenses.
Default Models
3
Max Rounds
2
Mode
Red Team
Evaluation Rubric
| Dimension | Weight | Evaluates |
|---|---|---|
| Security | 30% | Vulnerabilities, injection risks, auth flaws, data exposure |
| Correctness | 25% | Logic errors, off-by-one, null handling, race conditions |
| Performance | 20% | Time complexity, memory usage, unnecessary allocations |
| Maintainability | 15% | Code clarity, naming, structure, testability |
| Style | 10% | Consistency, formatting, idiomatic patterns |
Attack Categories
Example Prompt
Review this authentication middleware for security vulnerabilities and suggest improvementsOutput
Vulnerability report with severity ratings, defender's rebuttals, judge's final assessment, prioritized action items
Models explore different perspectives on complex research topics, challenge each other's sources and interpretations, and converge on well-supported conclusions. All claims must include citations. Uncertainties are explicitly flagged rather than glossed over.
Default Models
3
Max Rounds
3
Mode
Council
Evaluation Rubric
| Dimension | Weight | Evaluates |
|---|---|---|
| Accuracy | 30% | Factual correctness of claims and interpretations |
| Evidence Quality | 25% | Strength and relevance of cited sources |
| Completeness | 20% | Coverage of relevant perspectives and findings |
| Bias Awareness | 15% | Recognition of limitations and potential biases |
| Citation Quality | 10% | Proper attribution and source reliability |
Example Prompt
Synthesize current research on transformer architecture efficiency improvements published in 2024-2025Output
Comprehensive synthesis with inline citations, flagged uncertainties, confidence scores, areas of consensus and disagreement
Five models evaluate risks with mandatory dissent reporting. Every assessment must explicitly identify both majority and minority positions. Models rate likelihood and impact for each risk, propose concrete mitigations, and map to compliance frameworks.
Default Models
5
Max Rounds
3
Mode
Jury
Evaluation Rubric
| Dimension | Weight | Evaluates |
|---|---|---|
| Risk Identification | 25% | Completeness of risk catalog, no blind spots |
| Likelihood Assessment | 20% | Accuracy of probability estimates |
| Impact Analysis | 20% | Severity scoring and cascading effects |
| Mitigation Quality | 20% | Feasibility and effectiveness of proposed controls |
| Compliance | 15% | Regulatory alignment and framework mapping |
Example Prompt
Assess risks of migrating our production database from PostgreSQL to a multi-region CockroachDB setupOutput
Risk matrix with likelihood/impact ratings, mitigation strategies per risk, mandatory minority opinions, compliance mapping
Safety-critical deliberation with mandatory citations and dissent. Models provide differential diagnoses with evidence strength ratings. Red flags are automatically highlighted. Every claim must be backed by medical literature or clinical guidelines.
Default Models
3
Max Rounds
3
Mode
Council
Evaluation Rubric
| Dimension | Weight | Evaluates |
|---|---|---|
| Evidence Quality | 30% | Strength of clinical evidence, guideline adherence |
| Diagnostic Accuracy | 25% | Correctness of differential diagnosis |
| Safety Considerations | 20% | Red flag identification, contraindication awareness |
| Completeness | 15% | Coverage of differential, no missed diagnoses |
| Actionability | 10% | Clear next steps, testable hypotheses |
Example Prompt
Evaluate differential diagnosis for patient presenting with acute chest pain, elevated troponin, and normal ECGOutput
Ranked differential diagnosis with evidence strength, safety flags, dissenting opinions, recommended workup
Dialectical format: one model argues the risk position (this is dangerous/non-compliant), the other argues acceptability (this is fine/compliant). Both are evaluated blindly — model identity stripped — to ensure the quality of legal reasoning matters, not the model brand. Mandatory dissent ensures both sides are fully explored.
Default Models
2
Max Rounds
3
Mode
Blind
Evaluation Rubric
| Dimension | Weight | Evaluates |
|---|---|---|
| Legal Accuracy | 30% | Correctness of legal interpretations and citations |
| Risk Identification | 25% | Completeness of risk and liability analysis |
| Regulatory Compliance | 20% | Alignment with applicable regulations |
| Practicality | 15% | Feasibility of recommended changes |
| Clarity | 10% | Clear, actionable language for non-lawyers |
Example Prompt
Review this SaaS terms of service for GDPR compliance risks and recommend specific clause revisionsOutput
Clause-by-clause risk ratings, regulatory compliance gaps, recommended revisions with rationale, full dissent report
Quantitative financial risk analysis with mandatory dissent. Models evaluate market, credit, operational, and liquidity risks using standard metrics (VaR, CVaR, Sharpe ratio). Stress testing scenarios are required. Results mapped to regulatory frameworks (Basel III, SOX, Dodd-Frank, MiFID II).
Default Models
3
Max Rounds
3
Mode
Jury
Evaluation Rubric
| Dimension | Weight | Evaluates |
|---|---|---|
| Quantitative Rigor | 30% | VaR, CVaR, Sharpe ratio, statistical validity |
| Regulatory Alignment | 25% | Basel III, SOX, Dodd-Frank, MiFID II compliance |
| Risk Coverage | 20% | Market, credit, operational, liquidity risk completeness |
| Scenario Analysis | 15% | Stress testing, tail risk, Monte Carlo quality |
| Actionability | 10% | Hedging strategies, portfolio adjustments, timelines |
Example Prompt
Evaluate the risk profile of this investment portfolio under current market conditions with stress scenariosOutput
Risk assessment with quantitative metrics, stress test results, regulatory mapping, hedging strategies, mandatory dissent
Templates can be loaded programmatically via the template registry. Each template returns a configuration object with mode, rubric, system prompts, max rounds, and default models.
from consilium.templates import get_template, TEMPLATES
# List all templates
print(TEMPLATES.keys())
# → ["code_review", "research_synthesis", "risk_assessment",
# "healthcare", "legal", "finance"]
# Load a template
template = get_template("code_review")
# Returns: {
# topic: str,
# mode: "redteam",
# rubric: { security: 0.30, correctness: 0.25, ... },
# system_prompts: { attacker: "...", defender: "..." },
# max_rounds: 2,
# default_models: 3
# }