Argus — Token Governance, Measured

Post-run optimization report · 300 tasks · generated 2026-06-05 18:02 · fully reproducible (seed 7)

Cost vs naive frontier

−84%

$9.51 → $1.52, equal quality

Time vs naive frontier

−71%

2910s → 836s modeled

Pure efficiency

−35%

vs same models, no compression/cache/stop

Quality

0.930

Δ +0.005 vs flat-Sonnet

At this mix, Argus saves $26,626 per 1,000,000 tasks versus a naive "always use the frontier model" policy — at the same quality — and $2,738 per 1,000,000 tasks versus a perfectly model-matched team that simply lacks our compression, caching and loop-stopping layers. Linear projection from a 300-task, fully reproducible sample (seed 7).

Cost & time — before vs after

A naive team must choose between flat-Opus (pay frontier prices on every trivial task) and flat-Sonnet (cheaper, but it silently under-delivers on genuinely hard tasks). Argus routes each task to exactly the model it needs, then compresses, caches and stops stuck loops — delivering flat-Opus quality at a fraction of flat-Opus cost and time.

Wall-clock time is modeled from published per-model latency profiles (time-to-first-token + prefill + decode); see METHODOLOGY.md. All cost is list-price on both sides.

How Argus routed it & what quality resulted

Route	Tasks	Share
Sonnet	91	30%
Haiku	85	28%
Cache	102	34%
Stopped	12	4%
Opus	10	3%

How this dataset was generated

Everything is parametrized and seeded — no hidden randomness, no API spend in mock mode. Re-run with the same seed to reproduce every number on this page byte-for-byte.

n_tasks = 300seed = 7difficulty = easyreasoning_depth = 1.0cache_hit_rate = 0.12loop_rate = 0.04compressible_fraction = 0.65strong_model_fraction = None

Difficulty band	Required model	Share of unique tasks	Count
simple	claude-haiku-4-5	60%	151
medium	claude-sonnet-4-5	28%	71
hard	claude-sonnet-4-5	8%	20
expert	claude-opus-4-7	4%	10

Workload composition	Count
Unique tasks	252
Cache repeats (served from cache)	36
Stuck loops (SPRT force-stopped)	12
Total emitted	300

Routing policy (complexity → model): <0.33 → claude-haiku-4-5, <0.7 → claude-sonnet-4-5, ≥ → claude-opus-4-7. Full methodology in METHODOLOGY.md.