AI vs ML vs deep learning vs LLMs

A vendor pitches an “AI-powered fraud detection” tool. The technical lead pushes back: “We could solve this with classical machine learning — XGBoost on transaction features — for a fraction of the cost.” The room goes quiet because nobody else in the meeting is sure what classical machine learning is, how it differs from “AI,” or whether the technical lead is being helpful or obstructive.

In most business meetings, “AI”, “machine learning”, “deep learning”, and “LLM” refer to roughly the same thing — usually whatever the speaker has read about most recently. The imprecision rarely matters until someone is about to spend money on the wrong thing. Buying an LLM solution to a classical ML problem (or vice versa) is how teams end up with expensive infrastructure that doesn’t solve the actual problem.

Two things below: the family tree in plain language, and the “which one fits which problem” rule. The point isn’t to make you a data scientist. It’s to give you enough vocabulary to push back in a meeting when someone proposes the wrong tool.

The tree

How the terms nest

The four terms form a strict hierarchy. Each is a subset of the one above it.

Artificial Intelligence (AI) is the broadest category. Coined in 1956, it refers to any system that performs tasks normally requiring human intelligence — reasoning, planning, perception, language, decision-making. Classical AI includes rule-based systems, expert systems, search algorithms, planners. Modern AI usually means something narrower (everything below), but the umbrella term still covers the rule-based and search-based approaches that predate machine learning.

Machine Learning (ML) is a subset of AI. Systems that improve at a task by learning patterns from data rather than being explicitly programmed. Started getting serious in the 1980s. Includes regression, decision trees, random forests, support vector machines, gradient boosting (XGBoost, LightGBM), clustering. Most “AI” deployed in production at non-tech companies in 2026 is classical ML — fraud detection, churn prediction, demand forecasting, lead scoring, recommender systems.

Deep Learning (DL) is a subset of ML. Neural networks with many layers (the “deep” in deep learning) trained on large datasets. Took off around 2012 when GPUs got fast enough to train usefully large networks. Includes image classification (ResNet), speech recognition (Whisper), recommender systems at scale (YouTube, TikTok), and the entire transformer family.

Large Language Models (LLMs) are a subset of deep learning. Specifically: transformer-based neural networks trained on enormous amounts of text to predict the next token. GPT, Claude, Gemini, Llama, Mistral. The most-recent and most-talked-about category in this tree. Useful for text generation, summarisation, code, question answering, conversation. Less useful for tabular prediction, optimisation, and tasks classical ML solves better.

The hierarchy is strict but not exhaustive — there are deep-learning systems that aren’t LLMs (image models, recommender systems), there’s ML that isn’t deep (most of what runs in production), and there’s AI that isn’t ML at all (rules, planners, search). The four terms cover most of what you’ll meet in business contexts.

Where each fits

The right tool for the right problem

The honest framing is: start with the simplest tool that can plausibly solve the problem, and only escalate when it can’t.

Rules, lookup tables, and spreadsheets. Right answer when the logic is small, the rules are explicit, and the inputs are bounded. Tax calculations, eligibility checks, pricing tables. People reflexively reach for ML when a CASE WHEN clause would do; the maintenance cost of a spreadsheet is much lower than the maintenance cost of any model.

Classical machine learning. Right answer when the problem has structured/tabular data, a clear target variable, and enough labelled examples (hundreds to thousands). Fraud detection on transaction features. Churn prediction from usage data. Demand forecasting from history + calendar + weather. Lead scoring from CRM features. Almost everything finance, ops, and growth teams do is classical ML territory.

Deep learning (non-LLM). Right answer for high-dimensional unstructured inputs — images, audio, video — and for recommender systems at scale (the patterns are too complex for hand-crafted features). Image classification, speech-to-text, computer vision, recommender systems that learn from millions of user interactions. Heavier infrastructure than classical ML; less interpretable.

LLMs. Right answer for text — generation, summarisation, classification of unstructured text, question answering, code, conversation. Best when the task is genuinely about language and the cost per inference (cents to dollars) is acceptable. Not the right answer for tabular prediction (classical ML is cheaper and often more accurate), for optimisation (classical operations research wins), or for tasks where deterministic output matters (rules win).

The pattern: tool category follows problem shape, not the other way around. Teams that pick the tool first and then look for problems usually end up with infrastructure that doesn’t pay back.

The numbers

What each tier actually costs to run

Rules-based system — typical compute cost Negligible; runs on a laptop or basic server

Classical ML — training cost (typical business model on tabular data) $0–$100 in compute; minutes to hours on a laptop CPU or small GPU

Classical ML — inference cost Microseconds per prediction; can run on CPU; effectively free at scale

Deep learning non-LLM — training cost (typical image model) $100–$10,000 depending on dataset size and model; hours to days on a GPU

Deep learning non-LLM — inference cost Milliseconds per prediction on GPU; cents to dollars per million predictions at scale

LLM — training cost (frontier model from scratch) ~$100M–$1B+; only major vendors and a handful of well-funded labs

LLM — fine-tuning cost (off existing base) $100–$50,000 depending on data size and base model; hours to weeks

LLM — inference cost (API) $0.10–$30 per million tokens depending on model tier

Interpretability — classical ML High; can explain individual predictions (feature importance, SHAP values)

Interpretability — deep learning / LLMs Low to medium; harder to explain why a specific prediction or output happened

The cost asymmetry is real. A classical ML solution that works is two orders of magnitude cheaper to run than an LLM solution. The decision rule “start with the simplest tool” is partly an engineering best practice and partly a cost-control argument.

The decision rule

When to use which

A short decision tree that covers most business cases:

Can a spreadsheet, lookup table, or hand-written rules solve it? Use those. They are auditable, deterministic, and free.
Is the input tabular (rows and columns) with a clear target variable, and do you have labelled training data? Use classical ML. XGBoost, LightGBM, or a logistic regression will outperform an LLM at lower cost.
Is the input unstructured (images, audio, video) but the task is well-defined (classify, detect, transcribe)? Use a deep-learning model specialised for that input type — often a pre-trained model fine-tuned on your data.
Is the input text and the task involves generating text, summarising text, answering questions, or classifying meaning? Use an LLM.
Does the problem involve optimisation under constraints, planning, or search? Use operations research (linear programming, constraint solvers) rather than ML. This is one of the categories where “AI” still often means non-ML approaches.

The most common mistake is to skip to step 4 because LLMs are familiar, and end up paying ten cents to do what one tenth of a cent would have done. The second-most-common mistake is to ignore step 5 entirely — many “AI-powered scheduling” problems are constraint-satisfaction problems and the right tool is a constraint solver, not a model.

What's next

Related explainers

For the specific operator-level use of LLMs, see What an LLM does for a business. For the supporting concept that powers most modern AI applications, see Embeddings explained without math. For the longer “what tasks are AI right for” discussion, see When AI is the wrong tool.

Common questions

FAQ

Where does "generative AI" or "GenAI" fit in this tree?

GenAI is a colloquial term for the subset of AI that generates new content — text (LLMs), images (diffusion models), audio, video. It sits inside deep learning, sometimes inside LLMs, sometimes alongside them (image-generation models are deep learning but not LLMs). When someone says "GenAI" they usually mean LLMs and image generators; the term is more about the output (it generates content) than about a distinct technical category.

What about "AI agents"?

An "agent" in 2026 usage is an LLM-powered system that does multi-step work — calls tools, makes plans, executes them. Architecturally it's an application pattern built on top of LLMs (the LLM is the reasoning component); the term is about how the system is wired together, not a new model category. "AI agent" sits one level above LLMs in the application stack, not in the model hierarchy.

Is "deep learning" still distinct from "AI" in 2026, or has it collapsed?

Still distinct. Deep learning specifically refers to neural networks with many layers; AI as a field is broader. The terms have converged in popular usage — when people say "AI" in business contexts, they usually mean deep learning or LLMs — but the distinction matters when you're choosing tools. A fraud-detection system using XGBoost is AI but not deep learning; a chatbot using Claude is deep learning AND AI; a constraint solver scheduling delivery routes is AI but neither deep learning nor ML in the modern sense.

Should my team know all four, or just LLMs?

Vocabulary in all four is worth having. Most teams in 2026 will use LLMs as their default new AI tool, and that's often fine — but they should be able to recognise the cases where classical ML would be cheaper and faster, the cases where rules would be safer, and the cases where a vision model fits better than text generation. The point isn't depth in all four; it's enough fluency to know which conversation you're in.

When does classical ML beat an LLM on a specific task?

Three reliable cases. (1) Tabular prediction on structured features — XGBoost on transaction features beats GPT on fraud detection by accuracy and beats it by 10,000× on cost. (2) Volume-sensitive workflows where milliseconds and tenths of cents matter — recommendation systems, ad-ranking, lead-scoring. (3) Regulated contexts where interpretability is required — credit scoring, hiring decisions, insurance underwriting — classical ML with feature-importance explanations is defensible; LLMs are much harder to audit. The honest test: if you can describe the input as rows in a spreadsheet and you have labelled examples, try classical ML first.

How the terms nest

The right tool for the right problem

What each tier actually costs to run

When to use which

Related explainers

FAQ

Where does "generative AI" or "GenAI" fit in this tree?

What about "AI agents"?

Is "deep learning" still distinct from "AI" in 2026, or has it collapsed?

Should my team know all four, or just LLMs?

When does classical ML beat an LLM on a specific task?

Sources & references

Related solutions

AI hallucinations explained

AI privacy — what to watch for

AI procurement checklist for non-technical buyers

AI risk assessment for legal and compliance teams