Models
A reference catalog of the AI models we use, recommend, or compare in the playbook — what they're for, where to try them, when we last verified each entry.
Showing 79 of 79 models.
Claude Fable 5
Anthropic's most capable widely released model — the first public Mythos-class model, a tier above Opus 4.8 for the most demanding reasoning and long-horizon agentic work.
Claude Mythos 5
Anthropic's restricted frontier model — the same underlying model as Claude Fable 5 with safeguards lifted in select domains. Invitation-only through Project Glasswing, succeeding Claude Mythos Preview.
Claude Opus 4.8
Anthropic's most capable Opus-tier model for complex reasoning, long-horizon agentic coding, and high-autonomy work. The flagship of the standard Claude lineup, succeeding Opus 4.7; Claude Fable 5 sits in a new tier above it.
Stable Audio 3.0
Stability AI's open-weight audio model. Generates music tracks, sound effects, and audio loops from text prompts — the open alternative to Suno/Udio. (Small/medium tiers open-weighted; large is API-only.)
Gemini 3.5 Flash
Google's current top Flash-tier Gemini — sustained frontier-level intelligence for agentic and coding tasks at high speed and low cost. The GA successor to the Gemini 3 Flash preview.
Gemini 3.1 Flash-Lite
Google's low-latency Gemini 3-series workhorse for straightforward multimodal tasks at scale. It is designed for high-frequency agent routing, extraction, translation, and summarization work.
Gemini 3.1 Flash Live Preview
Google's current low-latency audio-to-audio Live API model for real-time dialogue and voice-first applications. It replaces the earlier Gemini Live surface with the Gemini 3.1 stack.
Gemini 3.1 Pro Preview
Google's top Gemini 3-series model for advanced reasoning, coding, and agentic workflows. It improves the Gemini 3 Pro line with better thinking, tool use, and factual consistency.
Veo 3.1
Google's flagship video generation model. Adds advanced creative controls and improved prompt adherence on top of the Veo 3 native-audio foundation.
Veo 3.1 Lite
Google's efficient, developer-first variant of Veo 3.1. Lower cost and faster generation; same family as the main 3.1 model with reduced fidelity for tighter feedback loops.
DeepSeek V4 Flash
DeepSeek's efficient tier of the V4 generation. Faster and cheaper than V4 Pro; the practical default for high-throughput agentic workloads.
DeepSeek V4 Pro
DeepSeek's flagship general-purpose MoE model. Successor to V3; competitive with closed frontier-tier models at open-weights cost.
Midjourney V8.1
Midjourney's flagship image generator. V8.1 is its fastest model yet (2K HD by default); strong artistic quality and a distinctive aesthetic, bound to its Discord and web product, no public API.
GPT-5.5
OpenAI's flagship model — a unified general-purpose and reasoning model positioned as a new class of intelligence for coding and professional work. The current successor to the GPT-4o / o-series lines.
GPT Image 2
OpenAI's current image generation and editing model. It replaces the older DALL·E line with a single state-of-the-art model for image creation and edits.
Qwen3.6 35B-A3B
Alibaba's current open-weight flagship from the Qwen3.6 generation. MoE architecture with 35B total / 3B active parameters; natively multimodal (text + vision) with up to ~1M-token context.
Gemma 4 31B
Google's flagship open-weight Gemma 4. Natively multimodal (text + vision); supersedes Gemma 2 with significantly improved capabilities and a larger size.
Mistral Medium 3.5
Mistral's frontier-class multimodal model for agentic and coding use cases. It sits between the largest flagship tier and the lighter Small line while remaining open-weight.
Suno v5.5
Suno's flagship music generation model. v5.5 adds Voices, Custom Models, and personalization; produces full songs (vocals + instrumentation) from a text prompt — the most polished consumer music AI.
GPT-5.4 mini
OpenAI's strongest small model for coding, computer use, and subagents. The efficient, lower-cost tier of the GPT-5 line and the named replacement for o4-mini.
Mistral Small 4
Mistral's hybrid open model that unifies instruct, reasoning, and coding in a single efficient line. It is the current small generalist flagship in Mistral's open lineup.
Claude 4.6 Sonnet
Anthropic's mid-tier model — the practical default for production workloads. Balances quality and cost for most applications.
FLUX.2 [dev]
Black Forest Labs' next-generation open-weight image model. Supersedes FLUX.1 [dev] with improved quality and control; remains the open-weights choice for self-hosted image generation.
GPT-5.3-Codex
OpenAI's most capable agentic coding model — tuned for long-horizon software engineering, tool use, and the Codex agent surface.
Kling 3.0
Kuaishou's video generation model. 3.0 adds native audio, multimodal input, and up to 15-second clips; strong on human motion and physical realism.
ElevenLabs Eleven v3
ElevenLabs' flagship expressive TTS (Eleven v3). The benchmark for natural-sounding speech and voice cloning across 70+ languages; Flash/Turbo v2.5 cover low-latency use.
Voyage 4
Voyage AI's flagship embedding model (voyage-4-large). Top of MTEB across many tasks; the embedding service Anthropic recommends for Claude RAG workloads.
OCR 3
Mistral's current OCR service for its Document AI stack. It extracts interleaved text and images from documents and replaces the older Mistral OCR line.
Cohere Rerank v4
Cohere's flagship reranker (rerank-v4.0-pro). The standard second-pass model after a vector or BM25 retrieval — 100+ languages, 32K per-doc context, bumps precision noticeably with minimal architecture changes.
Runway Gen-4.5
Runway's flagship video generation model. Strong creative-tooling ecosystem (motion brush, camera control, style transfer); the production tool of choice for many video creators.
Devstral 2
Mistral's frontier code-agents model for software engineering tasks. It is designed for tool-heavy coding workflows across whole repositories and multi-file edits.
Ministral 3 14B
The largest model in Mistral's Ministral 3 family. It is built for local deployment with strong text and vision performance on diverse hardware.
Mistral Large 3
Mistral AI's flagship model. Successor to Mistral Large 2 with improved multilingual coverage and reasoning. EU-jurisdiction provider.
FLUX.2 [pro]
Black Forest Labs' commercial flagship (FLUX.2 [pro]). The closed top tier when you need commercial-use rights and maximum quality; FLUX.2 [dev] is the open-weight sibling.
Claude Haiku 4.5
Anthropic's fast, cheap tier. The right choice for high-throughput agentic work and tasks where latency matters more than depth.
Sora 2
OpenAI's video generation model. Sora 2 adds synchronized audio and stronger physics over the original Sora. Note: OpenAI is winding the Sora product down — the consumer app closed and the API is slated to shut down 2026-09-24.
Luma Ray3
Luma AI's flagship video model (behind the Dream Machine app). Ray3 is reasoning-driven with native HDR; strong on cinematic camera moves and 3D-aware generation.
Magistral Medium 1.2
Mistral's frontier-class multimodal reasoning model. It is the dedicated reasoning line for deeper multi-step analysis where Mistral Small 4 or Medium 3.5 would be too shallow.
Codestral 25.08
Mistral's current code-completion model, released at the end of July 2025. It is tuned for low-latency fill-in-the-middle and high-frequency code generation tasks.
Llama 4 Maverick
Meta's flagship Llama 4 model — natively multimodal, larger MoE architecture than Scout. The Llama 4 frontier-tier entry.
Llama 4 Scout
Meta's small/efficient Llama 4 variant — natively multimodal MoE architecture. The practical Llama 4 entry point for self-hosted multimodal applications.
NVIDIA Parakeet
NVIDIA's STT family. parakeet-tdt-0.6b-v2 tops the HF Open ASR leaderboard for English; parakeet-tdt-0.6b-v3 adds 25-language multilingual support. Very fast on NVIDIA hardware via NeMo.
o3
OpenAI's flagship reasoning model. Uses extended chain-of-thought before answering, trading latency for depth on math, science, and complex coding tasks.
Cohere Embed v4
Cohere's flagship embedding model (embed-v4.0). Multimodal (text + image) with up to 128K context; strong multilingual coverage and compressed (int8/binary) embeddings — useful for cost-sensitive RAG.
GPT-4.1
OpenAI's developer-first frontier model for coding, instruction following, and long-context work. It is the API-oriented successor line to older GPT-4 variants.
GPT-4.1 mini
OpenAI's smaller GPT-4.1 variant. It keeps the 1M-token context window while lowering cost and latency enough for high-volume agent and application workloads.
Gemini 2.5 Flash
Google's speed-optimised tier. Cheap and fast multimodal, with a generous free tier on AI Studio for prototyping.
Gemini 2.5 Pro
Google's flagship multimodal model. Massive context window and competitive frontier-tier performance, with extended thinking on demand.
GPT-4o mini TTS
OpenAI's current text-to-speech model, built on GPT-4o mini. It replaces the older tts-1 line with better quality and a newer multimodal stack.
GPT-4o Transcribe
OpenAI's hosted speech-to-text model, built on GPT-4o. The API-recommended transcription model, with lower word-error rate and better language recognition than the original Whisper API.
Udio Allegro v1.5
Suno's main competitor. Allegro v1.5 is the current model — faster generation with strong genre control and a focus on song-structure quality.
Pika 2.2
Pika's video generation model. 2.2 adds Pikaframes keyframe transitions and 1080p; differentiates with Scene Ingredients (drop in characters/objects across shots) for character consistency across clips.
DeepSeek R1
DeepSeek's open-weight reasoning model. Released with full weights and a permissive MIT license — the first competitive open reasoning model.
Phi-4
Microsoft's small model trained heavily on synthetic data. Punches above its 14B weight on reasoning and math; MIT-licensed and runs locally.
Llama 3.3 70B
Meta's latest 70B-parameter open-weight model. Reaches frontier-tier performance for English-centric tasks while remaining self-hostable.
Qwen2.5-Coder 32B
Alibaba's flagship open code model. 32B parameters and Apache 2.0 — the strongest open coding model that fits on a single workstation GPU.
Stable Diffusion 3.5
Stability AI's open image-gen family. Three sizes (Large, Large Turbo, Medium) — runs locally on consumer GPUs and supports a massive ecosystem of LoRAs and ControlNets.
Whisper large-v3 Turbo
OpenAI's distilled Whisper variant. ~8× faster than large-v3 with most of the accuracy retained — the practical default for high-throughput STT pipelines.
Llama 3.2 Vision
Meta's open-weight vision-language family. 11B and 90B variants — the practical open-weights vision model for self-hosted multimodal applications.
Qwen 2.5 72B
Alibaba's flagship open-weight model. Strong on coding, math, and Chinese-language tasks; competitive with Llama 3.3 70B on Western benchmarks.
Qwen 2.5 7B
Alibaba's small-tier open model. Apache-licensed, runs on consumer hardware, and remains competitive with other 7B-class models on coding and math.
Llama 3.1 8B
Meta's small open-weight model. Runs on consumer hardware (16GB GPU or modern laptop) and remains a strong default for local-first AI.
GPT-4o mini
OpenAI's small, cheap, fast frontier model. The default workhorse for high-volume tasks where GPT-4o would be overkill.
DeepSeek-Coder V2
DeepSeek's open code model. MoE architecture with strong coverage across 338 programming languages; the open-weights coder of choice for high-end self-hosting.
BGE Reranker v2
BAAI's open reranker. Apache 2.0 weights, multiple sizes, and the open-weights default for self-hosted RAG pipelines that need a second-pass.
StarCoder2 15B
BigCode's collaborative code model. 15B parameters trained on 600+ programming languages; strong fit for IDE completion and self-hosted code search.
BGE-M3
BAAI's multi-functional embedding model. Supports dense, sparse, and multi-vector retrieval in one model — the strongest open-weights embedding option.
OpenAI text-embedding-3-large
OpenAI's largest embedding model. 3072 dimensions, multilingual, and the default high-quality option for RAG and semantic search.
OpenAI text-embedding-3-small
OpenAI's small embedding tier. 1536 dimensions; the cheap default for most RAG and semantic-search workloads where quality is sufficient.
Magnific AI
The premium 'creative upscaler'. Invents detail at high zoom factors using diffusion priors — the right choice when you want an upscale that adds fidelity rather than just enlarging pixels.
Whisper large-v3
High-accuracy multilingual speech-to-text. Best-in-class for non-English audio; the de-facto open baseline.
Coqui XTTS v2
Coqui's open multilingual TTS. Supports voice cloning with a 6-second sample across 17 languages — the leading open alternative to ElevenLabs.
Distil-Whisper
Hugging Face's distillation of Whisper (distil-large-v3.5). Faster than Whisper-large-v3-Turbo on long-form audio at small accuracy cost; English-only — pick when you don't need multilingual.
Piper
A fast, local TTS designed for Raspberry Pi-class hardware. Powers most self-hosted voice assistants where Whisper handles input and Piper handles output.
Stable Diffusion x4 Upscaler
Stability AI's diffusion-based 4× upscaler. Trades speed for quality — invents plausible high-frequency detail rather than just sharpening, which suits AI-generated images especially well.
SwinIR
Transformer-based image restoration model. Strong on text, edges, and faces — often produces sharper results than GAN-based upscalers on photographic content.
Real-ESRGAN
The de-facto open-weights image upscaler. Battle-tested across years of community use; runs on CPU or GPU, integrates with virtually every local image pipeline.
Vosk
An offline, lightweight speech recognition toolkit. Runs on phones, Raspberry Pi, and embedded devices — the right choice when Whisper is too heavy.
Topaz Gigapixel AI
Industry-standard desktop application for photo upscaling and restoration. The default tool for archival work, photo restoration, and print-sized enlargements where fidelity to the original matters.