Mixture of Agents · v1.0

A panel of experts beats one genius.

One prompt. Several models think in parallel. One clean answer comes out — better than any of them alone, and fasterthan the frontier model you can't get access to anymore.

Download for macOS Read documentation

Available for macOS, Linux, and Windows.

Get Waymarkcurl -fsSL https://waymark.sh/install | bash

Or read the documentation

The mechanism

The reference models are the panel.
The aggregator is the chair.

One question goes in. Several models think privately, in parallel. A sharp chair reads every answer and returns the best combined one — at the latency of the fast models, not the slow ones.

1 prompt → many models think → 1 answer, better than any of them alone.
Opus & GPT join the panel only when the task earns the spend.

Watch it route

You can see the panel
think.

The two complaints about routed models — they're slow, and they're a black box — are exactly what Waymark fixes. Fast primaries on Cerebras and Baseten keep latency low; rare escalations only fire when a task earns them; and the routing is printed to your terminal as it happens.

Sub-5 mintasks stay sub-5 min: And finish faster than a lone Opus or Codex call when the frontier isn't needed.
Long tasksbeat the frontier: When a job is genuinely hard, the panel out-reasons any single model on the board.
Every hopvisible as it happens: No black box. Watch each model, each escalation, each handoff — live and replayable.

waymark — agent

routing live

$ waymark run "harden auth session handling"

→planharden auth session handlingGLM-5.2Baseten1.6s

→editsrc/auth/session.tsGPT-OSS-120BCerebras2.1s

→editsrc/auth/tokens.tsGPT-OSS-120BCerebras1.9s

→testpnpm test authGLM-5.2Baseten2.4s

review needs deeper reasoning — escalating GLM-5.2 → GPT-5.5 (Codex) · rare

→reviewdiff · 3 filesGPT-5.5escalation6.3s

panel agreed · 1 answer returned · you saw every hop14.3s

Why Waymark

Stop chasing the model.
Build the system around it.

A mix of today's models beats the best single model you can no longer get — with no waiting, no gated access, and none of the latency that made the first wave of orchestrators painful to use.

Mixture of Agents was slow. Waymark isn't.

Primaries run on Cerebras and Baseten, not on a frontier API queue. The whole panel returns before a single Opus call clears its throat.

~4smedian panel, simple task

Frontier answers, GLM-class bill.

Cheap fast models carry the load. Waymark escalates to Opus or GPT only when the task earns it — so you pay frontier prices on the rare turn that needs them.

~6×cheaper than an Opus-max run

No black box. A glass one.

Every model that ran, every vote it cast, every escalation it triggered — all logged and replayable. The opacity people hate in routed models, gone.

100%of agent calls inspectable

DeepSWE Benchmark Score

Average pass@1 on DeepSWE · higher is better

WaymarkFrontierOpen weights

you

Waymark

MoA · GLM + GPT-OSS

auto-escalate

$1.20

Claude Code

Fable 5

max · gated

—

Codex

GPT-5.5

xhigh

$9.40

Codex

GPT-5.5

medium

$4.10

Claude Code

Opus 4.8

max

$11.80

Claude Code

Opus 4.8

medium

$6.20

Opencode

Opus 4.7

medium

$3.90

Cursor CLI

GPT-5.5

medium

$3.40

Claude Code

GLM-5.2

—

$1.10

Claude Code

GLM-5.1

—

$0.90

Gemini CLI

Gemini 3.1 Pro

high

$2.80

Fable-class score, GLM-class cost. Waymark tops the board at a fraction of the per-task spend — and Fable itself is gated to 100 partners. The panel ships what the frontier won't.

Run it yourself

Illustrative pass@1 on DeepSWE; cost is blended $/task at default settings. Independent eval forthcoming.

Pricing

Pay the blended rate,
not the frontier rate.

Cheap open-weights carry the vast majority of every task. Frontier models are billed only on the rare turns they're summoned. Your headline price is the amortized average of the two — a single, honest blended rate.

$1.20/ M input tokens

Blended across open + closed models. No per-model accounting.

$5.50/ M output tokens

Blended across open + closed models. No per-model accounting.

Tier	Models	$/M in	$/M out
Open-source primaries ~90% of tokens	GLM-5.2 · GPT-OSS-120B	$0.40	$1.44
Closed-source escalation ~10% of tokens	Opus 4.8 · GPT-5.5	$8.13	$42.50
Waymark blended amortized average	what you actually pay	$1.20	$5.50

Tier

$/M in

$/M out

Open-source primaries

~90% of tokens

$0.40

$1.44

Closed-source escalation

~10% of tokens

$8.13

$42.50

Waymark blended

amortized average

$1.20

$5.50

blend = 0.9 × open-weights + 0.1 × frontier · input ≈ 0.9(0.40) + 0.1(8.13) = $1.20 · output ≈ 0.9(1.44) + 0.1(42.50) = $5.50

One command

The model is the part you swap.
The system is the thing you own.

Flip the whole panel on with one command. Pick a preset, or pin your own reference models and aggregator. Provider-agnostic — plug in Baseten, Cerebras, Anthropic, OpenAI, or your own endpoint.

Drop-in: keeps your existing agent loops and tools
Per-surface routing: default, review, plan
Swap a reference model without touching your workflow

waymark — zsh

$ waymark moa

# turn the panel on

$ waymark model default --provider moa

# route normal turns through the panel

$ waymark model review --provider moa

# use the panel only for reviews

$ waymark presets

# fast-pair · opus-aggregator · budget

panel: glm-5.2 + gpt-oss-120b → aggregator

escalation: opus-4.8 (armed, idle)

✓ mixture of agents active

No harness required

Already have an agent?
Use the panel as a model.

Don't want to use our harness? Point any OpenAI-compatible client at Waymark and the whole mixture of agents answers behind a single model id. Swap your base_url and model — nothing else changes.

waymark-moabalanced panel · auto-escalates
waymark-fastprimaries only · never escalates
waymark-maxaggressive escalation for hard tasks

Routing decisions come back in x-waymark-route response headers, so the panel stays transparent even over the API.

waymark.sh — chat/completions

curl --location 'https://api.waymark.sh/v1/chat/completions' \
  --header 'Content-Type: application/json' \
  --header "Authorization: Bearer ${WAYMARK_API_KEY}" \
  --data '{
    "model": "waymark-moa",
    "max_completion_tokens": 1024,
    "temperature": 0.2,
    "top_p": 1,
    "stream": false,
    "reasoning_effort": "medium",
    "messages": [
      { "role": "user", "content": "Why is fast inference important?" }
    ]
  }'

One bet, two modalities

We proved it in voice.
Now in code.

Speechify's research bet is the same across every modality: preference beats any single model's self-report. A blind judge picking the best of many wins — whether the output is a voice or a diff.

In voice

A blind arena

1Listeners hear two clips — they don't know which model made which.
2They pick the one they prefer. No spec sheets, no self-reported scores.
3Thousands of votes aggregate into an ELO.

→ SIMBA 3.0 ranks #1

same
method

In code

A blind panel

1Several models answer your prompt privately, in parallel.
2An aggregator reads every answer — blind to who wrote it — and picks the best.
3One combined answer comes out.

→ Waymark beats any single model

Speechify AI Research Lab

Waymark comes from the lab building the
world's most preferred voices.

Speechify AI advances speech synthesis, voice cloning, and emotional expression — building voice AI indistinguishable from humans. The way we prove it is the same way Waymark picks an answer: put the options in front of a judge who doesn't know who's who, and let preference decide.

Speechify Voice Arena

blind A/B · ELO

1
SIMBA 3.0Speechify
1212
#1
2
Gemini 3.1 Flash TTSGoogle
1205
3
Realtime TTS 1.5Inworld
1199
4
Eleven v3ElevenLabs
1180
5
Speech 2.8 HDMiniMax
1163
6
TTS-1-HDOpenAI
1096

#1: TTS on voicearena.com & Artificial Analysis
Blind A/B: human listening studies, not self-reported scores
Indistinguishable: from human speech — the lab's standing goal

Illustrative ELO from blind pairwise listening tests. Methodology mirrors the Artificial Analysis TTS arena.

The other half of agents

Waymark writes the software.
Voice Agents give it a voice.

Build real-time voice agents that listen, think, and speak — powered by the same #1-ranked speech models and the same fast-inference stack that keeps Waymark's panel quick. Tools, knowledge, memory, and telephony through one API, at one all-in rate.

Explore Voice Agents API Read the docs

Tools & function calling
Knowledge base
Memory
Inbound & outbound telephony
Webhooks & events
Testing & simulation

Waymark is fast because of the silicon underneath it.

Served on the fastest inference in AI — on purpose.

Cerebras

wafer-scale inference

Primary · always on

Baseten

dedicated GLM endpoints

Primary · always on

Anthropic

Opus escalation

On escalation

OpenAI

GPT escalation

On escalation

The winning move isn't waiting.
It's the panel.

Install Waymark and point your agent at the panel. Frontier-class output, fast, today — no gated access required.

Download for macOS Read documentation

Available for macOS, Linux, and Windows.

A panel of experts beats one genius.

The reference models are the panel. The aggregator is the chair.

You can see the panelthink.

Stop chasing the model.Build the system around it.

Mixture of Agents was slow. Waymark isn't.

Frontier answers, GLM-class bill.

No black box. A glass one.

DeepSWE Benchmark Score

Pay the blended rate,not the frontier rate.

The model is the part you swap.The system is the thing you own.

Already have an agent?Use the panel as a model.

We proved it in voice.Now in code.

A blind arena

A blind panel

Waymark comes from the lab building the world's most preferred voices.

Waymark writes the software.Voice Agents give it a voice.

Served on the fastest inference in AI — on purpose.

The winning move isn't waiting.It's the panel.

The reference models are the panel.
The aggregator is the chair.

You can see the panel
think.

Stop chasing the model.
Build the system around it.

Pay the blended rate,
not the frontier rate.

The model is the part you swap.
The system is the thing you own.

Already have an agent?
Use the panel as a model.

We proved it in voice.
Now in code.

Waymark comes from the lab building the
world's most preferred voices.

Waymark writes the software.
Voice Agents give it a voice.

The winning move isn't waiting.
It's the panel.