Mixture of Agents was slow. Waymark isn't.
Primaries run on Cerebras and Baseten, not on a frontier API queue. The whole panel returns before a single Opus call clears its throat.
Now in public beta — Mixture of Agents that runs on Cerebras & Baseten, escalates only when it must.
Get WaymarkMixture of Agents · v1.0
One prompt. Several models think in parallel. One clean answer comes out — better than any of them alone, and fasterthan the frontier model you can't get access to anymore.
Available for macOS, Linux, and Windows.
curl -fsSL https://waymark.sh/install | bashOr read the documentation
The mechanism
One question goes in. Several models think privately, in parallel. A sharp chair reads every answer and returns the best combined one — at the latency of the fast models, not the slow ones.
1 prompt → many models think → 1 answer, better than any of them alone.
Opus & GPT join the panel only when the task earns the spend.
Watch it route
The two complaints about routed models — they're slow, and they're a black box — are exactly what Waymark fixes. Fast primaries on Cerebras and Baseten keep latency low; rare escalations only fire when a task earns them; and the routing is printed to your terminal as it happens.
$ waymark run "harden auth session handling"
→planharden auth session handlingGLM-5.2Baseten1.6s
→editsrc/auth/session.tsGPT-OSS-120BCerebras2.1s
→editsrc/auth/tokens.tsGPT-OSS-120BCerebras1.9s
→testpnpm test authGLM-5.2Baseten2.4s
review needs deeper reasoning — escalating GLM-5.2 → GPT-5.5 (Codex) · rare
→reviewdiff · 3 filesGPT-5.5escalation6.3s
panel agreed · 1 answer returned · you saw every hop14.3s
Why Waymark
A mix of today's models beats the best single model you can no longer get — with no waiting, no gated access, and none of the latency that made the first wave of orchestrators painful to use.
Primaries run on Cerebras and Baseten, not on a frontier API queue. The whole panel returns before a single Opus call clears its throat.
Cheap fast models carry the load. Waymark escalates to Opus or GPT only when the task earns it — so you pay frontier prices on the rare turn that needs them.
Every model that ran, every vote it cast, every escalation it triggered — all logged and replayable. The opacity people hate in routed models, gone.
Average pass@1 on DeepSWE · higher is better
Waymark
MoA · GLM + GPT-OSS
auto-escalate
$1.20
Claude Code
Fable 5
max · gated
—
Codex
GPT-5.5
xhigh
$9.40
Codex
GPT-5.5
medium
$4.10
Claude Code
Opus 4.8
max
$11.80
Claude Code
Opus 4.8
medium
$6.20
Opencode
Opus 4.7
medium
$3.90
Cursor CLI
GPT-5.5
medium
$3.40
Claude Code
GLM-5.2
—
$1.10
Claude Code
GLM-5.1
—
$0.90
Gemini CLI
Gemini 3.1 Pro
high
$2.80
Fable-class score, GLM-class cost. Waymark tops the board at a fraction of the per-task spend — and Fable itself is gated to 100 partners. The panel ships what the frontier won't.
Run it yourselfIllustrative pass@1 on DeepSWE; cost is blended $/task at default settings. Independent eval forthcoming.
Pricing
Cheap open-weights carry the vast majority of every task. Frontier models are billed only on the rare turns they're summoned. Your headline price is the amortized average of the two — a single, honest blended rate.
Blended across open + closed models. No per-model accounting.
Blended across open + closed models. No per-model accounting.
| Tier | Models | $/M in | $/M out |
|---|---|---|---|
Open-source primaries ~90% of tokens | GLM-5.2 · GPT-OSS-120B | $0.40 | $1.44 |
Closed-source escalation ~10% of tokens | Opus 4.8 · GPT-5.5 | $8.13 | $42.50 |
Waymark blended amortized average | what you actually pay | $1.20 | $5.50 |
blend = 0.9 × open-weights + 0.1 × frontier · input ≈ 0.9(0.40) + 0.1(8.13) = $1.20 · output ≈ 0.9(1.44) + 0.1(42.50) = $5.50
One command
Flip the whole panel on with one command. Pick a preset, or pin your own reference models and aggregator. Provider-agnostic — plug in Baseten, Cerebras, Anthropic, OpenAI, or your own endpoint.
$ waymark moa
# turn the panel on
$ waymark model default --provider moa
# route normal turns through the panel
$ waymark model review --provider moa
# use the panel only for reviews
$ waymark presets
# fast-pair · opus-aggregator · budget
panel: glm-5.2 + gpt-oss-120b → aggregator
escalation: opus-4.8 (armed, idle)
✓ mixture of agents active
No harness required
Don't want to use our harness? Point any OpenAI-compatible client at Waymark and the whole mixture of agents answers behind a single model id. Swap your base_url and model — nothing else changes.
waymark-moabalanced panel · auto-escalateswaymark-fastprimaries only · never escalateswaymark-maxaggressive escalation for hard tasksRouting decisions come back in x-waymark-route response headers, so the panel stays transparent even over the API.
curl --location 'https://api.waymark.sh/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer ${WAYMARK_API_KEY}" \
--data '{
"model": "waymark-moa",
"max_completion_tokens": 1024,
"temperature": 0.2,
"top_p": 1,
"stream": false,
"reasoning_effort": "medium",
"messages": [
{ "role": "user", "content": "Why is fast inference important?" }
]
}'One bet, two modalities
Speechify's research bet is the same across every modality: preference beats any single model's self-report. A blind judge picking the best of many wins — whether the output is a voice or a diff.
→ SIMBA 3.0 ranks #1
→ Waymark beats any single model
Speechify AI Research Lab
Speechify AI advances speech synthesis, voice cloning, and emotional expression — building voice AI indistinguishable from humans. The way we prove it is the same way Waymark picks an answer: put the options in front of a judge who doesn't know who's who, and let preference decide.
Speechify Voice Arena
blind A/B · ELOSIMBA 3.0Speechify
1212Gemini 3.1 Flash TTSGoogle
1205Realtime TTS 1.5Inworld
1199Eleven v3ElevenLabs
1180Speech 2.8 HDMiniMax
1163TTS-1-HDOpenAI
1096Illustrative ELO from blind pairwise listening tests. Methodology mirrors the Artificial Analysis TTS arena.
The other half of agents
Build real-time voice agents that listen, think, and speak — powered by the same #1-ranked speech models and the same fast-inference stack that keeps Waymark's panel quick. Tools, knowledge, memory, and telephony through one API, at one all-in rate.
Waymark is fast because of the silicon underneath it.
Cerebras
wafer-scale inference
Primary · always on
Baseten
dedicated GLM endpoints
Primary · always on
Anthropic
Opus escalation
On escalation
OpenAI
GPT escalation
On escalation
Install Waymark and point your agent at the panel. Frontier-class output, fast, today — no gated access required.
Available for macOS, Linux, and Windows.