Now in public beta — Mixture of Agents that runs on Cerebras & Baseten, escalates only when it must.

Get Waymark

Mixture of Agents · v1.0

A panel of experts beats one genius.

One prompt. Several models think in parallel. One clean answer comes out — better than any of them alone, and fasterthan the frontier model you can't get access to anymore.

Available for macOS, Linux, and Windows.

curl -fsSL https://waymark.sh/install | bash

Or read the documentation

The mechanism

The reference models are the panel. The aggregator is the chair.

One question goes in. Several models think privately, in parallel. A sharp chair reads every answer and returns the best combined one — at the latency of the fast models, not the slow ones.

Your 1 promptone askGLM-5.2Baseten · fastGPT-OSS 120BCerebras · ~2,000 tok/sOpus 4.8escalation · rareAggregator→ 1 better answerreference models · private analysis · the chair never sees who said what

1 prompt → many models think → 1 answer, better than any of them alone.
Opus & GPT join the panel only when the task earns the spend.

Watch it route

You can see the panel
think.

The two complaints about routed models — they're slow, and they're a black box — are exactly what Waymark fixes. Fast primaries on Cerebras and Baseten keep latency low; rare escalations only fire when a task earns them; and the routing is printed to your terminal as it happens.

Sub-5 mintasks stay sub-5 min
And finish faster than a lone Opus or Codex call when the frontier isn't needed.
Long tasksbeat the frontier
When a job is genuinely hard, the panel out-reasons any single model on the board.
Every hopvisible as it happens
No black box. Watch each model, each escalation, each handoff — live and replayable.
waymark — agent
routing live

$ waymark run "harden auth session handling"

planharden auth session handlingGLM-5.2Baseten1.6s

editsrc/auth/session.tsGPT-OSS-120BCerebras2.1s

editsrc/auth/tokens.tsGPT-OSS-120BCerebras1.9s

testpnpm test authGLM-5.2Baseten2.4s

review needs deeper reasoning — escalating GLM-5.2 → GPT-5.5 (Codex) · rare

reviewdiff · 3 filesGPT-5.5escalation6.3s

panel agreed · 1 answer returned · you saw every hop14.3s

Why Waymark

Stop chasing the model.
Build the system around it.

A mix of today's models beats the best single model you can no longer get — with no waiting, no gated access, and none of the latency that made the first wave of orchestrators painful to use.

Mixture of Agents was slow. Waymark isn't.

Primaries run on Cerebras and Baseten, not on a frontier API queue. The whole panel returns before a single Opus call clears its throat.

~4smedian panel, simple task

Frontier answers, GLM-class bill.

Cheap fast models carry the load. Waymark escalates to Opus or GPT only when the task earns it — so you pay frontier prices on the rare turn that needs them.

~6×cheaper than an Opus-max run

No black box. A glass one.

Every model that ran, every vote it cast, every escalation it triggered — all logged and replayable. The opacity people hate in routed models, gone.

100%of agent calls inspectable

DeepSWE Benchmark Score

Average pass@1 on DeepSWE · higher is better

WaymarkFrontierOpen weights
68
you
66
64
57
56
49
40
37
29
19
14

Waymark

MoA · GLM + GPT-OSS

auto-escalate

$1.20

Claude Code

Fable 5

max · gated

Codex

GPT-5.5

xhigh

$9.40

Codex

GPT-5.5

medium

$4.10

Claude Code

Opus 4.8

max

$11.80

Claude Code

Opus 4.8

medium

$6.20

Opencode

Opus 4.7

medium

$3.90

Cursor CLI

GPT-5.5

medium

$3.40

Claude Code

GLM-5.2

$1.10

Claude Code

GLM-5.1

$0.90

Gemini CLI

Gemini 3.1 Pro

high

$2.80

Fable-class score, GLM-class cost. Waymark tops the board at a fraction of the per-task spend — and Fable itself is gated to 100 partners. The panel ships what the frontier won't.

Run it yourself

Illustrative pass@1 on DeepSWE; cost is blended $/task at default settings. Independent eval forthcoming.

Pricing

Pay the blended rate,
not the frontier rate.

Cheap open-weights carry the vast majority of every task. Frontier models are billed only on the rare turns they're summoned. Your headline price is the amortized average of the two — a single, honest blended rate.

$1.20/ M input tokens

Blended across open + closed models. No per-model accounting.

$5.50/ M output tokens

Blended across open + closed models. No per-model accounting.

Tier$/M in$/M out

Open-source primaries

~90% of tokens

$0.40$1.44

Closed-source escalation

~10% of tokens

$8.13$42.50

Waymark blended

amortized average

$1.20$5.50

blend = 0.9 × open-weights + 0.1 × frontier · input ≈ 0.9(0.40) + 0.1(8.13) = $1.20 · output ≈ 0.9(1.44) + 0.1(42.50) = $5.50

One command

The model is the part you swap.
The system is the thing you own.

Flip the whole panel on with one command. Pick a preset, or pin your own reference models and aggregator. Provider-agnostic — plug in Baseten, Cerebras, Anthropic, OpenAI, or your own endpoint.

  • Drop-in: keeps your existing agent loops and tools
  • Per-surface routing: default, review, plan
  • Swap a reference model without touching your workflow
waymark — zsh

$ waymark moa

# turn the panel on

$ waymark model default --provider moa

# route normal turns through the panel

$ waymark model review --provider moa

# use the panel only for reviews

$ waymark presets

# fast-pair · opus-aggregator · budget

panel: glm-5.2 + gpt-oss-120baggregator

escalation: opus-4.8 (armed, idle)

✓ mixture of agents active

No harness required

Already have an agent?
Use the panel as a model.

Don't want to use our harness? Point any OpenAI-compatible client at Waymark and the whole mixture of agents answers behind a single model id. Swap your base_url and model — nothing else changes.

  • waymark-moabalanced panel · auto-escalates
  • waymark-fastprimaries only · never escalates
  • waymark-maxaggressive escalation for hard tasks

Routing decisions come back in x-waymark-route response headers, so the panel stays transparent even over the API.

waymark.sh — chat/completions
curl --location 'https://api.waymark.sh/v1/chat/completions' \
  --header 'Content-Type: application/json' \
  --header "Authorization: Bearer ${WAYMARK_API_KEY}" \
  --data '{
    "model": "waymark-moa",
    "max_completion_tokens": 1024,
    "temperature": 0.2,
    "top_p": 1,
    "stream": false,
    "reasoning_effort": "medium",
    "messages": [
      { "role": "user", "content": "Why is fast inference important?" }
    ]
  }'

One bet, two modalities

We proved it in voice.
Now in code.

Speechify's research bet is the same across every modality: preference beats any single model's self-report. A blind judge picking the best of many wins — whether the output is a voice or a diff.

In voice

A blind arena

  1. 1Listeners hear two clips — they don't know which model made which.
  2. 2They pick the one they prefer. No spec sheets, no self-reported scores.
  3. 3Thousands of votes aggregate into an ELO.

SIMBA 3.0 ranks #1

In code

A blind panel

  1. 1Several models answer your prompt privately, in parallel.
  2. 2An aggregator reads every answer — blind to who wrote it — and picks the best.
  3. 3One combined answer comes out.

Waymark beats any single model

Speechify AI Research Lab

Waymark comes from the lab building the world's most preferred voices.

Speechify AI advances speech synthesis, voice cloning, and emotional expression — building voice AI indistinguishable from humans. The way we prove it is the same way Waymark picks an answer: put the options in front of a judge who doesn't know who's who, and let preference decide.

Speechify Voice Arena

blind A/B · ELO
  • 1

    SIMBA 3.0Speechify

    1212
    #1
  • 2

    Gemini 3.1 Flash TTSGoogle

    1205
  • 3

    Realtime TTS 1.5Inworld

    1199
  • 4

    Eleven v3ElevenLabs

    1180
  • 5

    Speech 2.8 HDMiniMax

    1163
  • 6

    TTS-1-HDOpenAI

    1096
#1
TTS on voicearena.com & Artificial Analysis
Blind A/B
human listening studies, not self-reported scores
Indistinguishable
from human speech — the lab's standing goal

Illustrative ELO from blind pairwise listening tests. Methodology mirrors the Artificial Analysis TTS arena.

The other half of agents

Waymark writes the software.
Voice Agents give it a voice.

Build real-time voice agents that listen, think, and speak — powered by the same #1-ranked speech models and the same fast-inference stack that keeps Waymark's panel quick. Tools, knowledge, memory, and telephony through one API, at one all-in rate.

  • Tools & function calling
  • Knowledge base
  • Memory
  • Inbound & outbound telephony
  • Webhooks & events
  • Testing & simulation

Waymark is fast because of the silicon underneath it.

Served on the fastest inference in AI — on purpose.

Cerebras

wafer-scale inference

Primary · always on

Baseten

dedicated GLM endpoints

Primary · always on

Anthropic

Opus escalation

On escalation

OpenAI

GPT escalation

On escalation

The winning move isn't waiting.
It's the panel.

Install Waymark and point your agent at the panel. Frontier-class output, fast, today — no gated access required.

Available for macOS, Linux, and Windows.