Cyberax AI Playbook
cyberax.com
Comparison · Tool Decisions · Local-OK

Embeddings models compared for semantic search

An **embedding** is the way AI represents the meaning of a piece of text as a list of numbers, so similar things can be compared mathematically. An **embedding model** is the model that produces those numbers. This page compares five model lines — OpenAI text-embedding-3, Cohere embed v3, Voyage AI, Jina, and the open-source sentence-transformers family — on retrieval quality, dimension-and-cost trade-offs, and how to test them on your actual data before committing.

At a glance Last verified · May 2026
Problem solved Pick an embeddings model for semantic-search or RAG workloads — comparing OpenAI, Cohere, Voyage, Jina, and open-source options on retrieval quality, dimension cost, multilingual support, and self-host trade-offs
Best for Engineers building RAG or semantic-search systems, ML platform teams choosing infrastructure, and data teams running large clustering or matching workloads
Tools OpenAI text-embedding-3, Cohere embed-v3, Voyage AI, Jina, sentence-transformers
Difficulty Intermediate
Cost $0.00001–$0.0001 per 1k tokens (API embeddings) → $0 software cost (self-hosted) + infrastructure

An embedding is the way an AI represents the meaning of a piece of text as a list of numbers, so similar things can be compared mathematically. The embedding model is what produces those numbers. Pick a good one and your semantic-search, deduplication, classification, and RAG (retrieval-augmented generation — giving AI access to your specific documents) features work; pick a poor one and they don’t.

The embedding model decision quietly shapes every downstream retrieval choice. Once you’ve embedded 10 million documents in OpenAI text-embedding-3-large at 3,072 dimensions, you’re committed. Switching models means re-embedding the corpus and revalidating the vector database (a database that stores embeddings and lets you search by meaning). The “we’ll just pick one and move on” decision is the one that benefits most from honest comparison — the cost of getting it wrong is meaningful, and the marketing pages don’t surface the trade-offs that matter for your specific data.

Onward: retrieval-quality benchmarks, dimension-and-cost math, multilingual capability, self-hosting trade-offs, and the testing pattern that beats “pick the leader on MTEB and move on.”

Side by side

The comparison matrix

OpenAI text-embedding-3-largeCohere embed-v3Voyage AI (voyage-3)Jina v3sentence-transformers (open source)
Retrieval quality (MTEB benchmarks) Strong; leading commercial scores at large sizeStrong; close to OpenAI on most tasksStrong; competitive top-tier benchmarksGood; close behind the top commercial threeVariable by model; top open-source options reach top-commercial level on many tasks
Default dimension count 3,072 (large) / 1,536 (small) — can be reduced via Matryoshka1,024 (embed-v3) — fixed1,024 (voyage-3) — fixed1,024 (jina-embeddings-v3)Varies: 384 (mini-LM), 768 (all-mpnet), up to 1024+
Dimension reduction (Matryoshka) Yes — reduce to 256 / 512 / 1,024 with modest quality lossNo native; static dimensionsNo nativeYes — Matryoshka support for flexible truncationSome models support it via training
Multilingual support 100+ languages; strong on major100+ languages; strong on European and Asian100+ languages; documented strong on multilingual benchmarksStrong multilingual; designed for it from v3Multilingual models exist; quality varies
Pricing (per 1M tokens) $0.13 (3-large) / $0.02 (3-small)$0.10$0.06 (voyage-3) / $0.02 (voyage-3-lite) / $0.18 (voyage-3-large)$0.10 typical (v3); v4 pricing not disclosed publicly$0 software; infra-cost only
Self-hostable NoNoNoSome Jina models are open-weightsYes; the primary positioning
Rerank companion API No native rerankYes — Cohere Rerank (separate API)Yes — voyage-rerankYes — Jina RerankOpen-source rerank models available
API maturity Excellent — broad SDK coverageExcellent — well-documentedStrong; smaller ecosystem than OpenAIStrong; growing ecosystemHugging Face Transformers + many wrappers
Best for General-purpose; teams already on OpenAISearch + rerank workflows; multilingualLong-context (up to 32k input tokens)Multilingual; cost-conscious commercialSelf-hosted, privacy-sensitive, custom-trained
Max input context 8,192 tokens512 tokens (embed-v3); 128,000 tokens (embed-v4 — current flagship)Up to 32,000 tokens (voyage-3 family)8,192 (v3); 32,768 (v4 — current flagship)Typically 512 tokens; depends on model
The decision

What to actually use

For general-purpose RAG with broad ecosystem support — OpenAI text-embedding-3-large or text-embedding-3-small. The ecosystem support is the widest; every vector store has native OpenAI integration; the Matryoshka reduction lets you trade dimension for cost. Trade-off: not the cheapest at high volume, no native rerank.

For semantic-search workflows that benefit from a paired rerank model — Cohere embed-v3 with Cohere Rerank. The two-stage retrieve-then-rerank pattern with both from one vendor is well-documented; quality is competitive with OpenAI. Right for production search systems where rerank materially improves precision.

For long-context input (entire articles or documents per embedding) — Voyage AI’s voyage-3 with 32k input. Most other models cap at 512 or 8,192 tokens; voyage handles much longer inputs as a single embedding, which simplifies architectures that would otherwise need chunking. Right for whole-document retrieval workflows.

For multilingual workloads at moderate cost — Jina v3 or Cohere embed-v3. Both have documented strength on multilingual benchmarks; Jina trades slight quality for slightly lower cost at high volume. Right when your corpus spans many languages and per-language quality matters.

For privacy-sensitive or self-hosted operation — sentence-transformers (open-source). The all-mpnet-base-v2 and BGE family are competitive on quality with the commercial mid-tier, free software, and self-hostable on modest hardware. Right when data can’t leave your infrastructure (regulated industries, on-prem requirements).

For very-high volume cost optimisation — Either OpenAI text-embedding-3-small (good quality, low cost) or self-hosted open-source. At 100M+ embeddings, the cost differences compound significantly; rerun the math on your actual volume before defaulting.

The numbers

What you'll actually pay

OpenAI text-embedding-3-large $0.13 per 1M tokens — top quality, broadest ecosystem
OpenAI text-embedding-3-small $0.02 per 1M tokens — strong quality at much lower cost
Cohere embed-v4 $0.10 per 1M tokens — current flagship; 128k input context, multimodal
Voyage AI voyage-3 $0.06 per 1M tokens (older tier); 32k input context
Voyage AI voyage-3-lite $0.02 per 1M tokens; competitive on cost
Voyage AI voyage-3-large $0.18 per 1M tokens (top quality in the voyage line)
Jina embeddings v4 Current flagship; 32,768 token input; pricing not publicly disclosed
Self-hosted (sentence-transformers) $0 software; GPU infra cost typically $200–$1,000/month at scale
Embedding 1M documents (avg 1k tokens each) $130 (OpenAI large), $20 (OpenAI small), $100 (Cohere v4), $60 (Voyage-3), $180 (Voyage-3-large)
Re-embedding cost when changing models (1M docs) Same as initial cost — the switching cost is meaningful at large scale
Storage cost — vector store at 1M docs × dimension Higher dimensions = higher storage; Matryoshka reduction saves significantly

The per-token cost is small; the dimension-driven storage cost adds up at scale. Choose the dimension that fits your retrieval quality bar — higher isn’t always better, especially after rerank.

What changes between now and the next refresh

Volatility notes

  • New models frequently. OpenAI, Cohere, Voyage, Jina all iterate; quality leaders shift on benchmarks every few months.
  • Open-source catching up. Top open-source models (BGE, GTE, E5) regularly match mid-tier commercial on MTEB.
  • Specialised embeddings. Domain-specific models (legal, medical, code) emerging with documented quality lifts on their domains.

Re-verify every 6 months for cost; quality benchmarks shift faster than pricing.

What's next

Related work

For the underlying embeddings mental model, see Embeddings explained without math. For the vector-database choice that stores these embeddings, see Vector databases compared. For the broader RAG pattern these power, see RAG explained without acronyms. For the internal-Q&A bot architecture that combines embeddings with generation, see Internal Q&A bot over company docs.

Common questions

FAQ

How do we test embedding models on our actual data?

Build a test set of ~50 query-result pairs from your domain (queries plus the documents that should match). Embed your corpus with each candidate model; for each query, measure recall@5 and recall@10 against the ground-truth matches. The model that wins on your data may differ from the MTEB leader — your specific domain shape matters more than generic benchmarks.

Can we mix embedding models in one system?

Generally no — different models produce different vector spaces, and cosine similarity across models is meaningless. The exception: keep separate indexes per model and combine results at a higher level (hybrid retrieval, rerank). Most teams pick one model and commit; switching means re-embedding the corpus.

What about dimension count — bigger is better?

Not always. Higher-dimension embeddings cost more to store and compute against; the quality difference at top sizes is often small. Matryoshka-trained models let you truncate to lower dimensions with modest quality loss. Test on your data — many teams find 512–1,024 dimensions sufficient for production retrieval.

Should we self-host open-source embeddings?

If you have ML infra capacity and care about cost-at-scale or data control, yes. If you don't, the cost of running the infrastructure exceeds the API cost for most workloads. Self-host when you're at 50M+ embeddings or have strict data-residency requirements; use the API otherwise.

Sources & references

Change history (1 entry)
  • 2026-05-13 Initial publication.