Why route across three tiers instead of always using the best model?

Cost. The best model burns 5–10× the tokens of the production tier on tasks that don't need it. Routing keeps quality where it matters and slashes the bill where it doesn't.

Who decides which tier a task goes to?

A scoring rubric per workload type — code review goes to frontier, content tagging goes to fast tier, content writing goes to production. Tweakable per project, defaults are sensible.

How do you stop a runaway agent from burning through the cost ceiling?

Every run logs token spend against the workload's ceiling. Crossing 80% pages the operator; crossing 100% kills the run. No surprises at month-end, no overspend on the bill.

Models

Opus · Sonnet · Haiku · ChatGPT · OpenRouter.

Model routing — task tier picks the model

Match task to cheapest capable model

Hover or tap a node to see details.

The model choice matters more than the prompt. I run a routing layer — Opus for hard thinking, Sonnet for production defaults, Haiku for fast/cheap, GPT-5 for second opinions, OpenRouter for everything else.

Frontier · production · fast tier per task
Cost ceilings enforced per pipeline
Model swap behind a single interface (LiteLLM / OpenRouter)

Why

Picking one model and forcing every task through it is how AI bills blow up. A routing layer maps each task to the cheapest model that can do it well — and lets you swap when something better ships.

How

Tier the work: frontier · production · fast
Hard cost ceiling per pipeline, fail loud if hit
All providers behind one interface (LiteLLM / OpenRouter)

Proof

Cost reduction vs. flat-Opus: ~60%
Providers wired: Claude · OpenAI · Gemini · Mistral · local
Pipelines on routing: all prod

FAQ

Why route across three tiers instead of always using the best model?: Cost. The best model burns 5–10× the tokens of the production tier on tasks that don't need it. Routing keeps quality where it matters and slashes the bill where it doesn't.
Who decides which tier a task goes to?: A scoring rubric per workload type — code review goes to frontier, content tagging goes to fast tier, content writing goes to production. Tweakable per project, defaults are sensible.
How do you stop a runaway agent from burning through the cost ceiling?: Every run logs token spend against the workload's ceiling. Crossing 80% pages the operator; crossing 100% kills the run. No surprises at month-end, no overspend on the bill.

In production

AIOS routing layer
Task tier + cost ceiling chooses across Claude, GPT-5, Gemini, Mistral, local — same prompt, different lane.
See it
Cost ceiling enforcement
Every workload audited against a token-cost ceiling; crossing it pages the operator and pauses the agent.
Per-task model picks
Hard reasoning → Opus; production defaults → Sonnet; tagging / routing → Haiku. Same router decides for GPT-5 + Gemini tiers.

Ping Mat See pricing

Claude

Anthropic's frontier model family.

ChatGPT

OpenAI's general-purpose assistant.

Back to AI