Argus — Token Governance, Measured
Post-run optimization report · 300 tasks · generated 2026-06-05 18:02 · fully reproducible (seed 7)
Cost vs naive frontier
−84%
$9.51 → $1.52, equal quality
Time vs naive frontier
−71%
2910s → 836s modeled
Pure efficiency
−35%
vs same models, no compression/cache/stop
Quality
0.930
Δ +0.005 vs flat-Sonnet
At this mix, Argus saves $26,626 per 1,000,000 tasks versus a naive "always use the frontier model" policy — at the same quality — and $2,738 per 1,000,000 tasks versus a perfectly model-matched team that simply lacks our compression, caching and loop-stopping layers. Linear projection from a 300-task, fully reproducible sample (seed 7).
Cost & time — before vs after
A naive team must choose between flat-Opus (pay frontier
prices on every trivial task) and flat-Sonnet (cheaper, but it silently
under-delivers on genuinely hard tasks). Argus routes each task to exactly the
model it needs, then compresses, caches and stops stuck loops — delivering flat-Opus quality at a
fraction of flat-Opus cost and time.
Wall-clock time is modeled from published per-model latency profiles (time-to-first-token + prefill + decode); see METHODOLOGY.md. All cost is list-price on both sides.
How Argus routed it & what quality resulted
| Route | Tasks | Share |
|---|
| Sonnet | 91 | 30% |
| Haiku | 85 | 28% |
| Cache | 102 | 34% |
| Stopped | 12 | 4% |
| Opus | 10 | 3% |
How this dataset was generated
Everything is parametrized and seeded — no hidden randomness, no API spend in mock mode.
Re-run with the same seed to reproduce every number on this page byte-for-byte.
n_tasks = 300seed = 7difficulty = easyreasoning_depth = 1.0cache_hit_rate = 0.12loop_rate = 0.04compressible_fraction = 0.65strong_model_fraction = None
| Difficulty band | Required model | Share of unique tasks | Count |
| simple | claude-haiku-4-5 | 60% | 151 |
| medium | claude-sonnet-4-5 | 28% | 71 |
| hard | claude-sonnet-4-5 | 8% | 20 |
| expert | claude-opus-4-7 | 4% | 10 |
| Workload composition | Count |
| Unique tasks | 252 |
| Cache repeats (served from cache) | 36 |
| Stuck loops (SPRT force-stopped) | 12 |
| Total emitted | 300 |
Routing policy (complexity → model): <0.33 → claude-haiku-4-5, <0.7 → claude-sonnet-4-5, ≥ → claude-opus-4-7. Full methodology in METHODOLOGY.md.