AI · Claude · Cheat Sheet

Claude cheat sheet.

Model IDs, thinking budgets, caching, tools, batch — the working reference for building on Claude.

This is the sheet I keep open when wiring Claude into a production stack. Model IDs are current for the 4.x family (Opus 4.7, Sonnet 4.6, Haiku 4.5). Everything here compiles against the live Anthropic Messages API.

Model IDs (4.x family)

4 entries

Current Claude model IDs for the Anthropic API. Default to Sonnet unless the task truly needs Opus or Haiku speed.

claude-opus-4-7Opus
Frontier reasoning — hard refactors, architecture, math-heavy proofs.
claude-sonnet-4-6Sonnet
Balanced default — production coding, long docs, tool use.
claude-haiku-4-5-20251001Haiku
Fast + cheap — classifiers, extractors, high-QPS routers.
claude-opus-4-7[1m]1M ctx
Opus with the 1M context window — whole-repo reasoning.

Extended thinking

4 entries

Give Claude private thinking budget before it answers. Great for planning, tricky bugs, and architecture calls.

thinking: { type: "enabled", budget_tokens: 8000 }
Enable extended thinking on the messages API.
budget_tokens: 32000
Deep think — architecture, math, long chain-of-thought.
budget_tokens: 4000
Light think — enough to plan a small refactor.
stream: true
Stream thinking + response together; thinking blocks arrive first.

Prompt caching

4 entries

5-minute TTL. Cache expensive system prompts, tool schemas, and large context blocks; save 90% on repeat calls.

cache_control: { type: "ephemeral" }
Mark the end of a cacheable prefix block.
Up to 4 breakpoints
You can cache the system prompt, tools, and up to two message-level prefixes.
1024-token minimum
Cache reads require ≥1024 cached tokens (2048 for Opus).
cache_creation_input_tokens
First call: paid at 1.25×. Subsequent hits at 0.1×.

Tool use

4 entries

Function calling. Claude picks a tool, the client runs it, you feed the result back.

tools: [...]
Array of tool definitions with JSON Schema input_schema.
tool_choice: { type: "auto" }
Let Claude decide. Also: any, tool, none.
tool_use / tool_result
Two content-block types that pair per call.
disable_parallel_tool_use
Force serial tool calls when order matters.

Computer use

4 entries

The screen-control primitive. Claude sees a screenshot and emits mouse/keyboard actions.

anthropic-beta: computer-use-2025-*
Beta header required on the messages API.
type: "computer_20250124"
Latest computer-use tool schema.
screenshot / click / type / key
Action types Claude emits back for you to execute.
Sandbox itSafety
Never point computer use at your real desktop — VM / container only.

Batch API

4 entries

Async batch endpoint — 50% off list price, up to 24h SLA. Perfect for evals + backfills.

POST /v1/messages/batches
Submit up to 10,000 requests per batch, 256MB total.
50% discount
Half the per-token cost of the interactive API.
24h SLA
Most batches complete in minutes, but plan for a day.
GET .../results
JSONL results, one line per custom_id.

SDKs

4 entries

pip install anthropic
Python SDK — sync + async clients, streaming, tools, files.
npm i @anthropic-ai/sdk
TypeScript SDK — the same API surface, first-class types.
@anthropic-ai/claude-agent-sdk
Build custom agents on managed infrastructure (Managed Agents).
ANTHROPIC_API_KEY
Env var both SDKs read by default.

Cost ceilings

4 entries

Don't get surprised. Set spend guards at the org, API key, and app layers.

Console → Usage → Limits
Set org-level monthly spend cap.
Per-key limits
Bound each API key by RPM / TPM / daily spend.
max_tokens ceiling
Hard-cap output tokens per call in code.
OpenRouter fallbackFleet
Route around 5xx with a paid model first — see AIOS gotchas.

Ping Mat See pricing

Living document. If the API changes and this drifts, that's a bug — ping Mat and it gets a same-day patch.

Back to Claude