Cyberax AI Playbook
cyberax.com
Explainer · Foundations · Local-OK

Vendor lock-in risks with AI

Every AI product you wire into your operation creates a small dependency on the vendor; the cumulative dependency over 18 months becomes a strategic exposure. Where lock-in lives in modern AI stacks, the realistic mitigation patterns, and which lock-in is acceptable vs unacceptable for which workloads.

At a glance Last verified · May 2026
Problem solved Evaluate where AI vendor lock-in sits in your stack — model APIs, embedding spaces, fine-tuned customisations, training data — and apply mitigations that preserve flexibility without imposing unbearable engineering overhead
Best for Engineering leaders, founders making AI-stack decisions, CIOs at companies past series A, anyone building product features on AI infrastructure
Tools OpenAI, Anthropic, Google Cloud AI, AWS Bedrock, Azure OpenAI, Llama, Mistral
Difficulty Intermediate
Cost Mitigation costs vary; ranges from $0 (architectural choices) to $10,000+/year (multi-vendor infrastructure)

An engineering team adds three AI features over 18 months. They pick OpenAI for the chat assistant, Pinecone for the vector database (storage tuned for AI search), and Anthropic for the agent workflow. Each choice is reasonable in isolation. Then OpenAI raises prices, Anthropic deprecates the model the agent depends on, and a Pinecone outage takes search down for half a day. Migrating off any one of them is now a multi-week project. The cumulative dependency built up quietly, choice by choice, and became strategic exposure.

This piece is the honest analysis. Where lock-in actually sits in AI stacks (four levels). The realistic mitigation patterns (most are simpler than they look). And which lock-in is acceptable vs unacceptable for which workloads — not “avoid all lock-in,” but selective mitigation against the highest-impact risks.

The mental model

Where lock-in actually lives

AI-stack lock-in lives at four levels:

  • Model API lock-in. Your code calls a specific vendor’s API. The cost of switching is “rewrite the API calls” — usually modest engineering work, especially with abstraction libraries (LiteLLM, langchain). Low to moderate severity.

  • Capability lock-in. Some vendors offer features that don’t exist elsewhere — long-context, specific tool integrations, particular reasoning patterns. Migrating to a different vendor means losing the feature or building it yourself. Moderate severity, vendor-specific.

  • Embedding-space lock-in. Embeddings stored in a vector store with one model are incompatible with another model’s embeddings. Re-embedding 100 million documents to switch is a real cost — engineering work plus the embedding cost itself. High severity at scale.

  • Customisation lock-in. Fine-tuned models on proprietary infrastructure can’t transfer to another vendor. The training data and fine-tuning work is sunk cost. High severity if you’ve invested heavily.

Each level has different mitigation patterns; the cost of mitigation differs widely too.

The mitigation patterns

What actually reduces lock-in risk

Model API abstraction layer. Use a library like LiteLLM, Vercel AI SDK, or LangChain that abstracts the vendor-specific API. The cost is small (a wrapper layer) and the benefit is large: switching the underlying model becomes a config change, not a rewrite. Apply this for any production AI workload.

Multi-vendor capability evaluation. Before committing to a vendor-specific feature, evaluate the equivalent on alternative vendors. Often the differentiating capability is matched (or close enough) elsewhere, and the architectural decision can accommodate either. If the feature is genuinely unique and central to your product, the lock-in is acceptable; if it’s marginal, abstract around it.

Embedding-store discipline. Pick an embedding model with explicit migration considerations. OpenAI text-embedding-3 with Matryoshka dimensions, Cohere with stable versioning, or self-hosted open-source models with reproducible weights. Avoid pinning to embedding models with unclear backward-compatibility policies.

Fine-tuning portability. Where possible, use open-source base models for fine-tuning so the trained weights remain yours. Vendor fine-tuning APIs (OpenAI, Anthropic) produce models tied to their infrastructure; the work doesn’t transfer. Decide which lock-in is acceptable per workload.

Periodic vendor re-evaluation. Set a calendar reminder every 6 months to re-evaluate the AI vendor decisions. Pricing, capability, and policy shifts are frequent; the right answer changes. Without the cadence, you ossify on early decisions.

Which lock-in is acceptable

Calibrating the trade-offs

Not all lock-in is bad. The mitigation has engineering and cost overhead; over-mitigating produces brittle abstractions that fight the vendor’s actual capabilities. The framework:

  • For workloads where you genuinely depend on one vendor’s flagship capability (very long context, specific reasoning depth, unique tool integrations), the lock-in is the cost of access to the capability. Accept it; the alternative is shipping a worse product.

  • For routine workloads (classification, summarisation, structured extraction), the capability is broadly equivalent across vendors. Mitigate lock-in here aggressively; nothing valuable is being protected by single-vendor dependency.

  • For workloads with large embedded data (RAG over millions of documents, vector-search products), the embedding-store lock-in is the highest-impact. Invest in mitigation early; retrofitting is expensive.

  • For workloads with regulatory or compliance requirements, the vendor’s compliance posture is the binding constraint. Lock-in is acceptable if the vendor’s posture meets your needs; verify the alternative vendors’ posture before relying on a switch option.

The numbers

What lock-in actually costs when it bites

Cost to migrate API integration (with abstraction layer) Hours to days of engineering
Cost to migrate API integration (without abstraction layer) Weeks to months at scale
Cost to re-embed 10M-document corpus to switch embedding models $200–$2,000 in embedding costs plus engineering time
Cost to re-fine-tune a model on a different base Re-investment in training data preparation plus training compute
Frequency of vendor pricing changes (last 24 months) Multiple per major vendor
Frequency of model deprecations (forcing migration) At least one per year per major vendor
Engineering overhead for multi-vendor abstraction Modest — typically 5–10% additional engineering cost vs single-vendor
Strategic value of being able to switch quickly Material when negotiating vendor terms or responding to policy / availability changes

The migration costs are real but manageable with mitigation; the strategic value of optionality is substantial in negotiations and crises.

Common mistakes

How AI vendor decisions typically go wrong

Optimising for current-best capability without considering switching cost. Picking the leader on a 2024 benchmark commits you to that vendor’s roadmap. If their priorities diverge from yours, you’re stuck.

Skipping abstraction because “we’re moving fast.” The abstraction layer is small overhead at the start and large savings later. Teams that defer abstraction repeatedly are not actually moving fast on the things that matter.

Treating all vendors as equivalent. Not every vendor’s lock-in is the same. Pricing volatility, deprecation cadence, geographic availability, compliance posture all differ. Evaluate vendors on these dimensions, not just on demo quality.

Ignoring capacity / availability risk. AI vendors have had outages, capacity-constraint periods, and policy-driven access changes. Single-vendor dependency means your product is exposed to all of these.

Over-engineering for switch-ability. The inverse failure mode — building elaborate abstractions to protect against unlikely switches. Calibrate mitigation to realistic risk; protect against the high-impact and high-probability lock-in, accept lock-in elsewhere.

What's next

Related work

For the broader open-source-vs-proprietary framework, see Open-source vs proprietary AI — practical tradeoffs. For the local-vs-cloud architecture decisions, see When to run AI locally vs in the cloud. For the economics behind the lock-in trade-offs, see Tokens, context windows, and what they cost. For the cost-vector framework, see The hidden costs of “free” AI tools.

Common questions

FAQ

Should we use Azure OpenAI or AWS Bedrock instead of direct vendor APIs to reduce lock-in?

Partially. Azure OpenAI uses OpenAI's models in Microsoft infrastructure; Bedrock aggregates multiple model vendors in AWS. Both reduce some lock-in (cloud-vendor integration is unified) but introduce different lock-in (to the cloud vendor's specific deployments and pricing). The decision often comes down to which cloud vendor you're already on rather than to lock-in pure-play.

Is LiteLLM / LangChain abstraction enough?

For API-level abstraction, yes — switching vendors is a config change once the abstraction is in place. For deeper lock-in (embeddings, fine-tunes, capability-specific features), the abstraction layer doesn't help; those require deeper architectural choices. Use the abstraction at the API level; address the other lock-in types separately.

How often should we re-evaluate vendor choices?

Every 6 months for active production AI workloads; annually for stable ones. The category moves fast enough that quarterly re-evaluation is overkill (the cost of re-evaluation exceeds the value of the changes) but annual is too slow (changes accumulate that could have been acted on).

Can we negotiate against the lock-in?

Yes — enterprise contracts increasingly include provisions for model-version stability, advance notice of deprecations, and pricing predictability. Use the lock-in awareness as negotiation leverage; vendors that won't commit to stability are signalling future lock-in pain.

Sources & references

Change history (1 entry)
  • 2026-05-13 Initial publication.