Engineers – Deliberation engine (technical overview)

Deliberation Engine — Technical Overview

The deliberation engine is built on a LangGraph-based state machine that enforces formal debate protocols. Each phase processes sequentially, with convergence checked mathematically before termination.

A. State Machine

Phase Pipeline (Council/Deep/Jury/Blind)

PROPOSAL → CHALLENGE → REBUTTAL → EVALUATION → VOTING → AGGREGATION → CONVERGENCE → OUTPUT

After CONVERGENCE, the engine either loops back to PROPOSAL for another round or proceeds to OUTPUT. Round number increments after each convergence check.

DeliberationState Fields

B. Voting Mechanisms (4 Algorithms)

Condorcet (Primary)

Checks if any candidate beats ALL others pairwise, weighted by confidence_weight.

For each pair (A, B):
  score_A = sum(confidence_weight where A ranked above B)
  score_B = sum(confidence_weight where B ranked above A)
  A wins if score_A > score_B
Winner = candidate that wins ALL pairwise comparisons

Borda Count (Scoring)

points[candidate] += (n - 1 - rank) × confidence_weight

Produces complete ranking of all candidates.

Ranked Pairs (Tiebreaker)

When no Condorcet winner exists:

List all pairwise matchups with victory margins

Sort by margin (descending)

Lock edges while preventing cycles (topological sort)

Winner = candidate with no incoming locked edges

Copeland (Comparative)

score = pairwise_wins - pairwise_losses

Range: -(n-1) to +(n-1). Used for analysis, not final selection.

Pipeline

Borda scores → full ranking → Condorcet check → Ranked Pairs fallback

C. Convergence Detection

Three Metrics

Kendall Tau: ranking correlation between rounds, normalized to [0,1]
Jaccard Similarity: word-set overlap between proposals across rounds
Concession Rate: fraction of rebuttals where models concede or qualify

Formula

score = 0.40 × ranking_similarity
      + 0.35 × proposal_similarity
      + 0.25 × concession_rate

Termination Rules

round ≥ max_rounds → converged (forced)

round < 2 → not converged (need baseline)

score ≥ 0.85 → converged (consensus reached)

score < 0.85 → continue debating

D. Dissent Detection (Agglomerative Clustering)

Build Jaccard similarity matrix between all proposals

Initialize each proposal as singleton cluster

Iteratively merge closest clusters if similarity ≥ 0.5

Stop when no merges above threshold

1 cluster → consensus, 2+ clusters → dissent (largest = majority, rest = minority)

Output: { type, majority: { models, position, arguments }, minority: [...], disagreement_points }

E. Confidence Calibration

stability_score = avg(Jaccard(original_claims, post_challenge_claims))
concession_rate = count(CONCEDE rebuttals) / total_rebuttals
qualification_rate = count(QUALIFY rebuttals) / total_rebuttals

calibrated = stability_score × (1 - concession_rate) × (1 - 0.3 × qualification_rate)

Clamped to [0.0, 1.0]. Models that cave under pressure get lower scores.

F. Cost-Based Routing (Auto Mode)

Feature Extraction

token_count, has_code, is_factual, is_creative, is_analytical, has_stakes_keywords

Complexity Scoring

Base: <20 tokens → 0.1, ≤100 → 0.3, ≤500 → 0.5, >500 → 0.7

Adjustments: +0.2 code, +0.3 stakes, +0.2 analytical, +0.1 creative, -0.2 factual

Routing

< 0.3 → Quick/1 model | 0.3-0.6 → Council/3 models | ≥ 0.6 → Deep/3-5 models