Models
Opus · Sonnet · Haiku · ChatGPT · OpenRouter.
The model choice matters more than the prompt. I run a routing layer — Opus for hard thinking, Sonnet for production defaults, Haiku for fast/cheap, GPT-5 for second opinions, OpenRouter for everything else.
- Frontier · production · fast tier per task
- Cost ceilings enforced per pipeline
- Model swap behind a single interface (LiteLLM / OpenRouter)
Picking one model and forcing every task through it is how AI bills blow up. A routing layer maps each task to the cheapest model that can do it well — and lets you swap when something better ships.
- Tier the work: frontier · production · fast
- Hard cost ceiling per pipeline, fail loud if hit
- All providers behind one interface (LiteLLM / OpenRouter)
- Cost reduction vs. flat-Opus
- ~60%
- Providers wired
- Claude · OpenAI · Gemini · Mistral · local
- Pipelines on routing
- all prod
- Why route across three tiers instead of always using the best model?
- Cost. The best model burns 5–10× the tokens of the production tier on tasks that don't need it. Routing keeps quality where it matters and slashes the bill where it doesn't.
- Who decides which tier a task goes to?
- A scoring rubric per workload type — code review goes to frontier, content tagging goes to fast tier, content writing goes to production. Tweakable per project, defaults are sensible.
- How do you stop a runaway agent from burning through the cost ceiling?
- Every run logs token spend against the workload's ceiling. Crossing 80% pages the operator; crossing 100% kills the run. No surprises at month-end, no overspend on the bill.
- AIOS routing layer
Task tier + cost ceiling chooses across Claude, GPT-5, Gemini, Mistral, local — same prompt, different lane.
See it - Cost ceiling enforcement
Every workload audited against a token-cost ceiling; crossing it pages the operator and pauses the agent.
- Per-task model picks
Hard reasoning → Opus; production defaults → Sonnet; tagging / routing → Haiku. Same router decides for GPT-5 + Gemini tiers.