AI · Claude · Cheat Sheet

Claude cheat sheet.

Model IDs, thinking budgets, caching, tools, batch — the working reference for building on Claude.

This is the sheet I keep open when wiring Claude into a production stack. Model IDs are current for the 4.x family (Opus 4.7, Sonnet 4.6, Haiku 4.5). Everything here compiles against the live Anthropic Messages API.

Model IDs (4.x family)

4 entries

Current Claude model IDs for the Anthropic API. Default to Sonnet unless the task truly needs Opus or Haiku speed.

  • claude-opus-4-7Opus

    Frontier reasoning — hard refactors, architecture, math-heavy proofs.

  • claude-sonnet-4-6Sonnet

    Balanced default — production coding, long docs, tool use.

  • claude-haiku-4-5-20251001Haiku

    Fast + cheap — classifiers, extractors, high-QPS routers.

  • claude-opus-4-7[1m]1M ctx

    Opus with the 1M context window — whole-repo reasoning.

Extended thinking

4 entries

Give Claude private thinking budget before it answers. Great for planning, tricky bugs, and architecture calls.

  • thinking: { type: "enabled", budget_tokens: 8000 }

    Enable extended thinking on the messages API.

  • budget_tokens: 32000

    Deep think — architecture, math, long chain-of-thought.

  • budget_tokens: 4000

    Light think — enough to plan a small refactor.

  • stream: true

    Stream thinking + response together; thinking blocks arrive first.

Prompt caching

4 entries

5-minute TTL. Cache expensive system prompts, tool schemas, and large context blocks; save 90% on repeat calls.

  • cache_control: { type: "ephemeral" }

    Mark the end of a cacheable prefix block.

  • Up to 4 breakpoints

    You can cache the system prompt, tools, and up to two message-level prefixes.

  • 1024-token minimum

    Cache reads require ≥1024 cached tokens (2048 for Opus).

  • cache_creation_input_tokens

    First call: paid at 1.25×. Subsequent hits at 0.1×.

Tool use

4 entries

Function calling. Claude picks a tool, the client runs it, you feed the result back.

  • tools: [...]

    Array of tool definitions with JSON Schema input_schema.

  • tool_choice: { type: "auto" }

    Let Claude decide. Also: any, tool, none.

  • tool_use / tool_result

    Two content-block types that pair per call.

  • disable_parallel_tool_use

    Force serial tool calls when order matters.

Computer use

4 entries

The screen-control primitive. Claude sees a screenshot and emits mouse/keyboard actions.

  • anthropic-beta: computer-use-2025-*

    Beta header required on the messages API.

  • type: "computer_20250124"

    Latest computer-use tool schema.

  • screenshot / click / type / key

    Action types Claude emits back for you to execute.

  • Sandbox itSafety

    Never point computer use at your real desktop — VM / container only.

Batch API

4 entries

Async batch endpoint — 50% off list price, up to 24h SLA. Perfect for evals + backfills.

  • POST /v1/messages/batches

    Submit up to 10,000 requests per batch, 256MB total.

  • 50% discount

    Half the per-token cost of the interactive API.

  • 24h SLA

    Most batches complete in minutes, but plan for a day.

  • GET .../results

    JSONL results, one line per custom_id.

SDKs

4 entries
  • pip install anthropic

    Python SDK — sync + async clients, streaming, tools, files.

  • npm i @anthropic-ai/sdk

    TypeScript SDK — the same API surface, first-class types.

  • @anthropic-ai/claude-agent-sdk

    Build custom agents on managed infrastructure (Managed Agents).

  • ANTHROPIC_API_KEY

    Env var both SDKs read by default.

Cost ceilings

4 entries

Don't get surprised. Set spend guards at the org, API key, and app layers.

  • Console → Usage → Limits

    Set org-level monthly spend cap.

  • Per-key limits

    Bound each API key by RPM / TPM / daily spend.

  • max_tokens ceiling

    Hard-cap output tokens per call in code.

  • OpenRouter fallbackFleet

    Route around 5xx with a paid model first — see AIOS gotchas.

Living document. If the API changes and this drifts, that's a bug — ping Mat and it gets a same-day patch.