AI

Models

Opus · Sonnet · Haiku · ChatGPT · OpenRouter.

The model choice matters more than the prompt. I run a routing layer — Opus for hard thinking, Sonnet for production defaults, Haiku for fast/cheap, GPT-5 for second opinions, OpenRouter for everything else.

  • Frontier · production · fast tier per task
  • Cost ceilings enforced per pipeline
  • Model swap behind a single interface (LiteLLM / OpenRouter)
Why

Picking one model and forcing every task through it is how AI bills blow up. A routing layer maps each task to the cheapest model that can do it well — and lets you swap when something better ships.

How
  • Tier the work: frontier · production · fast
  • Hard cost ceiling per pipeline, fail loud if hit
  • All providers behind one interface (LiteLLM / OpenRouter)
Proof
Cost reduction vs. flat-Opus
~60%
Providers wired
Claude · OpenAI · Gemini · Mistral · local
Pipelines on routing
all prod
Model routing — task tier picks the model
Match task to cheapest capable model
Hover or tap a node to see details.
FAQ
Why route across three tiers instead of always using the best model?
Cost. The best model burns 5–10× the tokens of the production tier on tasks that don't need it. Routing keeps quality where it matters and slashes the bill where it doesn't.
Who decides which tier a task goes to?
A scoring rubric per workload type — code review goes to frontier, content tagging goes to fast tier, content writing goes to production. Tweakable per project, defaults are sensible.
How do you stop a runaway agent from burning through the cost ceiling?
Every run logs token spend against the workload's ceiling. Crossing 80% pages the operator; crossing 100% kills the run. No surprises at month-end, no overspend on the bill.
In production
  • AIOS routing layer

    Task tier + cost ceiling chooses across Claude, GPT-5, Gemini, Mistral, local — same prompt, different lane.

    See it
  • Cost ceiling enforcement

    Every workload audited against a token-cost ceiling; crossing it pages the operator and pauses the agent.

  • Per-task model picks

    Hard reasoning → Opus; production defaults → Sonnet; tagging / routing → Haiku. Same router decides for GPT-5 + Gemini tiers.