Most non-tech companies buy AI tools the way they buy other software: demo, references, security questionnaire, contract. The deal usually closes on the demo’s quality, which has roughly no predictive value for how the AI will hold up in production. The questions that actually predict outcomes — what does the AI fail at, what data flows where, how is the model updated, what happens to your pricing if your usage grows — get skipped because the buyer doesn’t know to ask.
Below: a 20-question checklist for non-technical buyers. Each question is operational, answerable in plain language, and produces a decision-relevant input. The framework doesn’t need a CTO in the room; it does need the discipline to ask the questions and not accept “trust the vendor” as the answer.
What the AI actually does (and doesn't)
-
What is the specific workflow this AI is replacing or accelerating? Vague answers (“AI for productivity”) predict vague outcomes. Specific answers (“AI that classifies and routes customer-support tickets”) predict implementable deployments.
-
What does the production accuracy look like on tasks like ours? Vendors will quote demo numbers. Ask for production data — actual customer-specific results, not benchmarks. Healthy vendors share this; deflecting vendors are flagged.
-
What are the failure modes? Every AI fails sometimes. Vendors that can’t articulate failure modes either don’t understand them or don’t want to. Either is a red flag.
-
How does the AI handle edge cases or unusual inputs? Production work is mostly the long tail. The demo shows the common 60%; the actual work includes the difficult 40%. Ask specifically about your domain’s edge cases.
-
How do we know when the AI is wrong? The mechanism for catching errors matters as much as the error rate. Confidence scores, validation rules, audit logs, human-in-the-loop gates.
What flows where, and what happens to it
-
What data do we send to the AI vendor? Sometimes obvious; sometimes not (the AI may receive more context than is immediately visible).
-
Where does the data physically reside? US, EU, regional. Affects regulatory exposure.
-
Is our data used to train the vendor’s models? Default for free tiers: often yes. Enterprise tiers: usually no, but verify in writing.
-
What are the data retention and deletion policies? How long does the vendor keep our data, and what happens when we churn? “Deleted within 30 days” is materially different from “retained indefinitely for service improvement.”
-
What compliance certifications does the vendor hold? SOC 2 Type II, ISO 27001, HIPAA (if relevant), GDPR-readiness. Required for regulated industries; useful evidence for others.
How it fits with existing systems
-
What’s the realistic integration effort? Plug-and-play, low-code, full engineering project. Vendors overstate ease; ask for the engineering hours of a representative recent customer integration.
-
What systems does it integrate with natively? Your CRM, helpdesk, doc store, identity provider. Lack of native integration means custom integration work.
-
What’s the API stability commitment? Will the vendor break our integration when they update? Healthy vendors version APIs and deprecate slowly.
-
What happens if we need to migrate off? Data export formats, transition support, contractual obligations. The exit path matters even if you don’t expect to use it.
What changes after the demo
-
What’s the actual pricing at our expected usage? Demo pricing often differs from real-volume pricing. Ask for a realistic-volume cost projection.
-
How are pricing increases handled? Annual increases, mid-term adjustments, surprise charges. Standard SaaS contracts handle this; AI-specific contracts sometimes have usage-based surprises.
-
What’s the indemnification posture? IP, accuracy, output liability. Major vendors offer limited indemnification on enterprise plans; consumer plans typically don’t.
-
What’s the termination flexibility? 12-month auto-renew with 60-day notice is common; mid-term termination usually costly. Negotiate based on your confidence in the vendor.
-
What’s the support and SLA commitment? Response times, availability guarantees, dedicated contacts. Production AI workloads benefit from real SLAs.
-
What’s the vendor’s financial health? Smaller AI vendors face the same uncertainty most early-stage companies do. Established vendors with revenue produce less risk than venture-funded startups burning cash.
What due diligence actually costs and saves
The checklist’s value is in preventing six-figure mistakes; the time investment is modest relative to the cost of getting procurement wrong.
Related work
For the broader risk framework, see AI risk assessment for legal and compliance teams. For why programs fail without rigorous setup, see Why most “AI strategies” fail in the first 90 days. For the privacy-specific evaluation, see AI privacy — what to watch for. For the vendor-lock-in considerations that connect to procurement, see Vendor lock-in risks with AI.
FAQ
Do we need a CTO to run this checklist?
No, that's the point. Each question is answerable in plain language. Bring in a technical reviewer for the integration questions if you have one; many SMBs don't and the checklist works without one. A trusted external advisor (fractional CTO, peer at another company) can fill in for the technical questions when needed.
What if the vendor won't answer some questions?
That's the answer. Vendors that deflect on data handling, accuracy, or contractual terms are signalling something. Some deflection is reasonable (early-stage vendors may not have formal answers); refusal to engage is a red flag.
Should we run all 20 questions for every AI vendor?
Yes, scaled to the deployment size. For a $20/month seat subscription, you can compress the conversation; for a $50,000/year enterprise contract, run the full checklist with documentation. The framework scales; the discipline of asking the questions is what matters.
What about open-source AI tools we're considering?
Most questions still apply; the answers shift. Data-handling becomes about your own infrastructure (you're the operator); accuracy and failure modes are still relevant; integration and exit considerations change shape. The same evaluation framework, applied to a different ownership model.