Benchmarks

The Constellation produces benchmark data continuously as a side-effect of operation: consensus scores, lane latencies, synthesis fallback levels, dissent preservation rates, audit-chain coverage. We're preparing a published methodology and results catalog. Until then, this page documents what we measure and why.

In Preparation
This page is currently being written. Content will be published soon.
What We Measure — measurement signals from the constellation
Measurement signals

What We Measure

Every multi-lane query through the Constellation Control Window emits an audit record with the following signals:

  • Consensus score — agreement level across the 7 core lanes' responses to the same input.
  • Per-lane latency — end-to-end time per lane, including failover cooldowns.
  • Synthesis fallback level — which of the 4 synthesis tiers handled the response (Claude → OpenAI → rule-based → CPN escalation).
  • Dissent preservation rate — whether the unified synthesis kept attributable disagreement when lanes diverged.
  • Lola Rule tag compliance — count of [VERIFIED], [APPROXIMATE], and [HYPOTHESIZED] tags per synthesis.
  • Audit chain coverage — % of system actions with a complete provenance record in the JSONL chain.

Methodology

The Constellation runs in active research operation. We do not maintain separate research and production stacks — that would be a different kind of dishonesty than the architecture is designed to prevent. Benchmark numbers reflect the same code path that handles live operator queries.

Our methodology favors longitudinal measurement over snapshot results. Drift detection requires a window, not a moment. The published catalog will report distributions (p50/p95/p99) and trend lines, not single-run averages.

Results Catalog

Forthcoming
A published results catalog is in preparation. It will cover Light Guard latency distributions (p50/p95/p99 against the 25 ms hard cap), per-lane availability and failover frequency, consensus distribution histograms across thousands of queries, and longitudinal Lola-Rule tag-compliance rates. Until then, see Research Focus for the threads behind these measurements and CCW Platform for how the data is generated.