AI
Gemini
Google's multimodal model family.
Gemini's strength is multimodal — long video, long audio, and tight integration with the Google stack (Drive, Workspace, NotebookLM). I use it in pipelines that need to ingest the messy real world.
- Best-in-class video + audio understanding
- NotebookLM for source-grounded research
- Tight Workspace integration for backoffice work
Why
Multimodal is where Gemini wins outright — long video, long audio, deep Workspace integration. If the pipeline has to ingest the messy real world, this is the model that does it cheapest.
How
- Gemini for video/audio ingestion + Workspace ops
- NotebookLM for source-grounded research bundles
- Hand off generation to Claude or specialised image/video models
Proof
- Hours of audio ingested
- 200+
- NotebookLM workflows
- PAD · LAD · VAD
- Workspace integrations
- Drive · Gmail · Docs
Gemini — multimodal in, anything out
Text · Image · Audio · Video — one model family
Hover or tap a node to see details.
FAQ
- When does Gemini beat Claude or GPT?
- Native multimodal — Gemini ingests text + image + audio + video in the same prompt without stitching, which makes it the cleanest fit for cross-modality reasoning and Veo/Imagen pipelines.
- Which Gemini tier should I start with?
- Flash for production defaults (cheap and fast), Pro for harder reasoning, Imagen/Veo for media generation. The router layer in AIOS picks per task.
- Can I mix Gemini with Claude in one pipeline?
- Yes — and you should. Use Gemini's multimodal in, hand structured output to Claude for reasoning, drive the build from Claude Code. Each model in its strongest lane.
In production
- VIDAI pipeline
Veo + Imagen via Gemini drive the multi-channel video pipeline — 1,000+ shorts to YT/TikTok/IG.
See it - Cross-modality client brief
Voice memo + diagram photo + PDF in one prompt — Gemini summarises across all three for the intake agent.
- Fleet image router
Imagen one of four providers in IMAGAI — picked when the brief is photoreal + brand-tight.
See it