Back to Documentation

Engineers – Deliberation engine (technical overview)

Deliberation Engine — Technical Overview

The deliberation engine is built on a LangGraph-based state machine that enforces formal debate protocols. Each phase processes sequentially, with convergence checked mathematically before termination.


A. State Machine

Phase Pipeline (Council/Deep/Jury/Blind)

PROPOSAL → CHALLENGE → REBUTTAL → EVALUATION → VOTING → AGGREGATION → CONVERGENCE → OUTPUT

After CONVERGENCE, the engine either loops back to PROPOSAL for another round or proceeds to OUTPUT. Round number increments after each convergence check.

DeliberationState Fields


B. Voting Mechanisms (4 Algorithms)

Condorcet (Primary)

Checks if any candidate beats ALL others pairwise, weighted by confidence_weight.

For each pair (A, B):
  score_A = sum(confidence_weight where A ranked above B)
  score_B = sum(confidence_weight where B ranked above A)
  A wins if score_A > score_B
Winner = candidate that wins ALL pairwise comparisons

Borda Count (Scoring)

points[candidate] += (n - 1 - rank) × confidence_weight

Produces complete ranking of all candidates.

Ranked Pairs (Tiebreaker)

When no Condorcet winner exists:

  • List all pairwise matchups with victory margins
  • Sort by margin (descending)
  • Lock edges while preventing cycles (topological sort)
  • Winner = candidate with no incoming locked edges
  • Copeland (Comparative)

    score = pairwise_wins - pairwise_losses

    Range: -(n-1) to +(n-1). Used for analysis, not final selection.

    Pipeline

    Borda scores → full ranking → Condorcet check → Ranked Pairs fallback


    C. Convergence Detection

    Three Metrics

    • Kendall Tau: ranking correlation between rounds, normalized to [0,1]
    • Jaccard Similarity: word-set overlap between proposals across rounds
    • Concession Rate: fraction of rebuttals where models concede or qualify

    Formula

    score = 0.40 × ranking_similarity
          + 0.35 × proposal_similarity
          + 0.25 × concession_rate

    Termination Rules

  • round ≥ max_rounds → converged (forced)
  • round < 2 → not converged (need baseline)
  • score ≥ 0.85 → converged (consensus reached)
  • score < 0.85 → continue debating

  • D. Dissent Detection (Agglomerative Clustering)

  • Build Jaccard similarity matrix between all proposals
  • Initialize each proposal as singleton cluster
  • Iteratively merge closest clusters if similarity ≥ 0.5
  • Stop when no merges above threshold
  • 1 cluster → consensus, 2+ clusters → dissent (largest = majority, rest = minority)
  • Output: { type, majority: { models, position, arguments }, minority: [...], disagreement_points }


    E. Confidence Calibration

    stability_score = avg(Jaccard(original_claims, post_challenge_claims))
    concession_rate = count(CONCEDE rebuttals) / total_rebuttals
    qualification_rate = count(QUALIFY rebuttals) / total_rebuttals
    
    calibrated = stability_score × (1 - concession_rate) × (1 - 0.3 × qualification_rate)

    Clamped to [0.0, 1.0]. Models that cave under pressure get lower scores.


    F. Cost-Based Routing (Auto Mode)

    Feature Extraction

    token_count, has_code, is_factual, is_creative, is_analytical, has_stakes_keywords

    Complexity Scoring

    Base: <20 tokens → 0.1, ≤100 → 0.3, ≤500 → 0.5, >500 → 0.7

    Adjustments: +0.2 code, +0.3 stakes, +0.2 analytical, +0.1 creative, -0.2 factual

    Routing

    < 0.3 → Quick/1 model | 0.3-0.6 → Council/3 models | ≥ 0.6 → Deep/3-5 models