Models
A reference catalog of the AI models we use, recommend, or compare in the playbook — what they're for, where to try them, when we last verified each entry.
Showing 77 of 77 models.
Gemini 3.1 Flash-Lite
Google's low-latency Gemini 3-series workhorse for straightforward multimodal tasks at scale. It is designed for high-frequency agent routing, extraction, translation, and summarization work.
Gemini 3.1 Flash Live Preview
Google's current low-latency audio-to-audio Live API model for real-time dialogue and voice-first applications. It replaces the earlier Gemini Live surface with the Gemini 3.1 stack.
Gemini 3.1 Pro Preview
Google's top Gemini 3-series model for advanced reasoning, coding, and agentic workflows. It improves the Gemini 3 Pro line with better thinking, tool use, and factual consistency.
Gemini 3 Flash Preview
Google's fast Gemini 3-series model. It targets frontier-class multimodal understanding and agentic coding behavior at a lower cost tier than Pro.
Gemma 4 31B
Google's flagship open-weight Gemma 4. Natively multimodal (text + vision); supersedes Gemma 2 with significantly improved capabilities and a larger size.
Veo 3.1
Google's flagship video generation model. Adds advanced creative controls and improved prompt adherence on top of the Veo 3 native-audio foundation.
Veo 3.1 Lite
Google's efficient, developer-first variant of Veo 3.1. Lower cost and faster generation; same family as the main 3.1 model with reduced fidelity for tighter feedback loops.
DeepSeek V4 Flash
DeepSeek's efficient tier of the V4 generation. Faster and cheaper than V4 Pro; the practical default for high-throughput agentic workloads.
DeepSeek V4 Pro
DeepSeek's flagship general-purpose MoE model. Successor to V3; competitive with closed frontier-tier models at open-weights cost.
Qwen 3.5 122B
Alibaba's flagship open-weight model from the Qwen 3.5 generation. MoE architecture with 122B total / 10B active parameters; native multimodal (text + vision).
GPT Image 2
OpenAI's current image generation and editing model. It replaces the older DALL·E line with a single state-of-the-art model for image creation and edits.
Mistral Medium 3.5
Mistral's frontier-class multimodal model for agentic and coding use cases. It sits between the largest flagship tier and the lighter Small line while remaining open-weight.
Mistral Small 4
Mistral's hybrid open model that unifies instruct, reasoning, and coding in a single efficient line. It is the current small generalist flagship in Mistral's open lineup.
FLUX.2 [dev]
Black Forest Labs' next-generation open-weight image model. Supersedes FLUX.1 [dev] with improved quality and control; remains the open-weights choice for self-hosted image generation.
Claude 4.7 Opus
Anthropic's flagship model — the strongest Claude variant for analysis, long-context reasoning, and complex agentic work. Default choice when you want the highest-quality Claude output.
OCR 3
Mistral's current OCR service for its Document AI stack. It extracts interleaved text and images from documents and replaces the older Mistral OCR line.
Devstral 2
Mistral's frontier code-agents model for software engineering tasks. It is designed for tool-heavy coding workflows across whole repositories and multi-file edits.
Ministral 3 14B
The largest model in Mistral's Ministral 3 family. It is built for local deployment with strong text and vision performance on diverse hardware.
Mistral Large 3
Mistral AI's flagship model. Successor to Mistral Large 2 with improved multilingual coverage and reasoning. EU-jurisdiction provider.
Claude Haiku 4.5
Anthropic's fast, cheap tier. The right choice for high-throughput agentic work and tasks where latency matters more than depth.
Claude 4.6 Sonnet
Anthropic's mid-tier model — the practical default for production workloads. Balances quality and cost for most applications.
Magistral Medium 1.2
Mistral's frontier-class multimodal reasoning model. It is the dedicated reasoning line for deeper multi-step analysis where Mistral Small 4 or Medium 3.5 would be too shallow.
Gemini 2.5 Deep Think
Google's enhanced reasoning mode on top of Gemini 2.5 Pro. Trades latency for depth on hard math, science, and multi-step problem-solving.
Codestral 25.08
Mistral's current code-completion model, released at the end of July 2025. It is tuned for low-latency fill-in-the-middle and high-frequency code generation tasks.
Llama 4 Maverick
Meta's flagship Llama 4 model — natively multimodal, larger MoE architecture than Scout. The Llama 4 frontier-tier entry.
Llama 4 Scout
Meta's small/efficient Llama 4 variant — natively multimodal MoE architecture. The practical Llama 4 entry point for self-hosted multimodal applications.
o3
OpenAI's flagship reasoning model. Uses extended chain-of-thought before answering, trading latency for depth on math, science, and complex coding tasks.
o4-mini
OpenAI's small reasoning model. Faster and cheaper than o3 while keeping the chain-of-thought architecture; the practical default for routine reasoning tasks.
Kling 2.0
Kuaishou's video generation model. Strong on human motion and physical realism; popular for portrait and character-driven generation.
GPT-4.1
OpenAI's developer-first frontier model for coding, instruction following, and long-context work. It is the API-oriented successor line to older GPT-4 variants.
GPT-4.1 mini
OpenAI's smaller GPT-4.1 variant. It keeps the 1M-token context window while lowering cost and latency enough for high-volume agent and application workloads.
Gemini 2.5 Flash
Google's speed-optimised tier. Cheap and fast multimodal, with a generous free tier on AI Studio for prototyping.
Gemini 2.5 Pro
Google's flagship multimodal model. Massive context window and competitive frontier-tier performance, with extended thinking on demand.
GPT-4o mini TTS
OpenAI's current text-to-speech model, built on GPT-4o mini. It replaces the older tts-1 line with better quality and a newer multimodal stack.
DeepSeek R1
DeepSeek's open-weight reasoning model. Released with full weights and a permissive MIT license — the first competitive open reasoning model.
Pika 2
Pika's video generation model. Differentiates with Scene Ingredients (drop in characters/objects across shots) — the right pick when you need character consistency across clips.
Phi-4
Microsoft's small model trained heavily on synthetic data. Punches above its 14B weight on reasoning and math; MIT-licensed and runs locally.
Sora
OpenAI's flagship video generation model. Up to 20-second clips at 1080p with strong prompt adherence and physics simulation.
Llama 3.3 70B
Meta's latest 70B-parameter open-weight model. Reaches frontier-tier performance for English-centric tasks while remaining self-hostable.
Qwen QwQ-32B
Alibaba's open-weight reasoning model. 32B parameters with a permissive Apache 2.0 license — the practical reasoning model that fits on a workstation.
Suno v4
Suno's flagship music generation model. Produces full songs (vocals + instrumentation) from a text prompt — the most polished consumer music AI.
Qwen2.5-Coder 32B
Alibaba's flagship open code model. 32B parameters and Apache 2.0 — the strongest open coding model that fits on a single workstation GPU.
Stable Diffusion 3.5
Stability AI's open image-gen family. Three sizes (Large, Large Turbo, Medium) — runs locally on consumer GPUs and supports a massive ecosystem of LoRAs and ControlNets.
Whisper large-v3 Turbo
OpenAI's distilled Whisper variant. ~8× faster than large-v3 with most of the accuracy retained — the practical default for high-throughput STT pipelines.
Llama 3.2 Vision
Meta's open-weight vision-language family. 11B and 90B variants — the practical open-weights vision model for self-hosted multimodal applications.
Qwen 2.5 72B
Alibaba's flagship open-weight model. Strong on coding, math, and Chinese-language tasks; competitive with Llama 3.3 70B on Western benchmarks.
Qwen 2.5 7B
Alibaba's small-tier open model. Apache-licensed, runs on consumer hardware, and remains competitive with other 7B-class models on coding and math.
Voyage 3
Voyage AI's flagship embedding model. Top of MTEB across many tasks; the embedding service Anthropic recommends for Claude RAG workloads.
NVIDIA Parakeet
NVIDIA's English-focused STT model. Top of HF Open ASR Leaderboard for English and very fast on NVIDIA hardware via NeMo.
FLUX.1 [pro]
Black Forest Labs' commercial flagship. The same architecture as FLUX.1 [dev] tuned higher; the closed tier when you need commercial-use rights.
Llama 3.1 8B
Meta's small open-weight model. Runs on consumer hardware (16GB GPU or modern laptop) and remains a strong default for local-first AI.
GPT-4o mini
OpenAI's small, cheap, fast frontier model. The default workhorse for high-volume tasks where GPT-4o would be overkill.
DeepSeek-Coder V2
DeepSeek's open code model. MoE architecture with strong coverage across 338 programming languages; the open-weights coder of choice for high-end self-hosting.
Runway Gen-3
Runway's video generation model. Strong creative-tooling ecosystem (motion brush, camera control, style transfer); the production tool of choice for many video creators.
Luma Dream Machine
Luma AI's video generation model. Strong on cinematic camera moves and 3D-aware generation; the right pick when production-feel camera language matters.
GPT-4o
OpenAI's flagship multimodal model — text, vision, and realtime voice in one model. The default "omni" frontier model.
Cohere Rerank v3
Cohere's flagship reranker. The standard second-pass model after a vector or BM25 retrieval — bumps precision noticeably with minimal architecture changes.
Udio
Suno's main competitor. Music generation with strong genre control and a focus on song-structure quality.
Stable Audio 2.5
Stability AI's open-weight audio model. Generates music tracks, sound effects, and audio loops from text prompts — the open alternative to Suno/Udio.
BGE Reranker v2
BAAI's open reranker. Apache 2.0 weights, multiple sizes, and the open-weights default for self-hosted RAG pipelines that need a second-pass.
StarCoder2 15B
BigCode's collaborative code model. 15B parameters trained on 600+ programming languages; strong fit for IDE completion and self-hosted code search.
BGE-M3
BAAI's multi-functional embedding model. Supports dense, sparse, and multi-vector retrieval in one model — the strongest open-weights embedding option.
OpenAI text-embedding-3-large
OpenAI's largest embedding model. 3072 dimensions, multilingual, and the default high-quality option for RAG and semantic search.
OpenAI text-embedding-3-small
OpenAI's small embedding tier. 1536 dimensions; the cheap default for most RAG and semantic-search workloads where quality is sufficient.
Magnific AI
The premium 'creative upscaler'. Invents detail at high zoom factors using diffusion priors — the right choice when you want an upscale that adds fidelity rather than just enlarging pixels.
Midjourney v6
Midjourney's flagship image generator. Strong artistic quality and a distinctive aesthetic; bound to its Discord and web product, no public API.
Whisper large-v3
High-accuracy multilingual speech-to-text. Best-in-class for non-English audio; the de-facto open baseline.
Cohere Embed v3
Cohere's flagship embedding model. Strong multilingual coverage and built-in support for compressed (int8/binary) embeddings — useful for cost-sensitive RAG.
Coqui XTTS v2
Coqui's open multilingual TTS. Supports voice cloning with a 6-second sample across 17 languages — the leading open alternative to ElevenLabs.
Distil-Whisper
Hugging Face's distillation of Whisper. ~6× faster than the original at small accuracy cost; English-only — pick when you don't need multilingual.
ElevenLabs Multilingual v2
ElevenLabs' flagship multilingual TTS. The benchmark for natural-sounding speech and voice cloning across 29+ languages.
Piper
A fast, local TTS designed for Raspberry Pi-class hardware. Powers most self-hosted voice assistants where Whisper handles input and Piper handles output.
Stable Diffusion x4 Upscaler
Stability AI's diffusion-based 4× upscaler. Trades speed for quality — invents plausible high-frequency detail rather than just sharpening, which suits AI-generated images especially well.
SwinIR
Transformer-based image restoration model. Strong on text, edges, and faces — often produces sharper results than GAN-based upscalers on photographic content.
Real-ESRGAN
The de-facto open-weights image upscaler. Battle-tested across years of community use; runs on CPU or GPU, integrates with virtually every local image pipeline.
Vosk
An offline, lightweight speech recognition toolkit. Runs on phones, Raspberry Pi, and embedded devices — the right choice when Whisper is too heavy.
Topaz Gigapixel AI
Industry-standard desktop application for photo upscaling and restoration. The default tool for archival work, photo restoration, and print-sized enlargements where fidelity to the original matters.