Cyberax AI Playbook
cyberax.com
Explainer · Foundations

What an LLM actually does for a business

What large language models — the technology behind ChatGPT, Claude, and Gemini — really are, what they're genuinely useful for, and where they fail. In plain language, with no jargon, no hype, and no acronyms left undefined.

At a glance Last verified · May 2026
Problem solved Get a clear, jargon-free picture of what AI language tools can and can't do — without reading marketing copy or research papers
Best for Founders, managers, ops leads, and anyone deciding whether to bring AI into their team's work
Tools ChatGPT, Claude, Gemini
Difficulty Beginner

A large language model — an LLM, the technology behind ChatGPT, Claude, and Gemini — is a program that predicts what text is most likely to come next, given some text you provide. That’s the whole idea. The reason it can feel magical is that “predicting plausible next text” turns out to cover an astonishing range of useful work: summarising a document, drafting an email, answering a question, translating a paragraph, sorting a support ticket, writing a piece of code.

This page walks through what an LLM is, the three things it does genuinely well, the three things it does badly, and a small decision rule for when to bring one into your work.

The thing itself

What an LLM actually is

A large language model is a program that, given some text, predicts what text is most likely to come next. Trained on a very large amount of writing — books, websites, documentation, code, conversations — it has absorbed the patterns in how humans use language. It doesn’t store the text it was trained on; it stores the statistical regularities of that text in a compressed form (the “weights”), and uses those regularities to generate new text one piece at a time.

That’s it. It’s a very, very capable next-text predictor.

The reason this seems magical is that “predicting the next plausible piece of text” turns out to cover an astonishing amount of useful work: summarising a document, drafting an email, writing a function, answering a question, translating a paragraph, classifying a support ticket. All of these can be expressed as “given this input text, what text should come next?”

What it is not: a database of facts, a reasoner from first principles, a calculator, or a system that knows what it knows. When it gives a confident-sounding wrong answer, it isn’t lying — it’s predicting plausible text in the same way it does when it gives a confident-sounding right answer. The mechanism is the same.

What it does well

Three things LLMs are genuinely good at

Transforming text from one form to another. Summarising a long document into bullet points, rewriting a paragraph at a different reading level, translating between languages, reformatting an unstructured email into a structured ticket, extracting names and dates from a contract, turning a transcript into action items. These tasks have a clear input and a clear output, and the model is performing a transformation on text it can see. This is where LLMs are strongest.

Drafting plausible first text from a brief. A first-pass marketing email, a code outline, a meeting agenda, a cover letter, a social post. The output usually needs human editing before it ships, but the time saved on the empty-page problem is real.

Pattern-matching against text it can see. Classifying support tickets by topic, finding the section of a long document that answers a question, comparing two pieces of text for differences. When the relevant information is in the prompt, the model can work with it accurately.

What it doesn't do well

Three places LLMs fail in business contexts

Anything requiring genuine knowledge of facts the model wasn’t trained on. Your company’s internal data, this morning’s news, last quarter’s pricing change, a product that launched after the model’s training cutoff — none of this is inside the model. If you ask, you’ll often get a plausible guess with no reliable way to tell from the response alone. The fix is to give the model the relevant facts at the time of the question — a pattern called retrieval, covered in RAG explained without acronyms.

Mathematical and logical reasoning beyond a small number of steps. Modern models can do arithmetic and basic algebra, but reliability drops sharply as problems get longer or require keeping intermediate state. A model that gets the answer right in one prompt may get it wrong in the next phrasing. For anything that has to be exactly right — financial calculations, conversion logic, deterministic transformations — write code or use a calculator. (Modern flagship models offer a “code interpreter” or “tool use” mode that delegates math to actual code under the hood; that’s the right setting for arithmetic.)

Keeping consistent state across long interactions. A model has a context window — the amount of text it can hold in mind at once. Today’s flagship models can hold around 1 million tokens (roughly 700,000 words), which sounds enormous, but in practice attention degrades over very long contexts and the model can lose track of details mentioned earlier. Treat each conversation as if the model has a short memory that you are responsible for refreshing.

The numbers

What it costs and how big it gets

For most business questions, the cost of a single interaction with a flagship model is fractions of a cent. The volume tier — running thousands of automated calls per day, or processing millions of documents — is where pricing becomes a planning constraint.

Standard consumer tier (ChatGPT Plus, Claude Pro, Google AI Pro) ~$20/month
Power-user tier (ChatGPT Pro, Claude Max, Google AI Ultra) $100–$250/month (entry tiers $100; top tier $200–$250)
Free tier — flagship model access Limited daily messages on all three vendors
Typical API cost — input $0.10–$5 per million tokens for mid-tier flagships (~$0.0001–$0.005 per short question); top reasoning/Pro tiers reach $30/M
Typical API cost — output $0.40–$30 per million tokens for mid-tier flagships (3–5× input cost); top reasoning/Pro tiers reach $180/M
Largest context window available in 2026 ~1M tokens (~700k words) on flagship Claude, Gemini, and GPT-5.5

A token is roughly three-quarters of an English word. A typical short email is ~200 tokens; a long internal memo, ~2,000; a 300-page book, around 100,000. The pricing-per-million-tokens framing means that even at mid-tier flagship rates, processing a long document costs cents — running it across millions of documents is where the bill grows.

When to use one

A small decision rule

Use an LLM when: the task is text-in-text-out, a draft is more useful than nothing, and a human will see the output before it has consequences. Most “AI in the business” wins look like this — drafting first passes, classifying inbound messages, summarising long documents, surfacing relevant sections of a knowledge base.

Don’t use an LLM when: the answer must be exactly right with no review (use code or a calculator), the relevant information isn’t visible to the model and you haven’t built a way to give it (the model will guess, and the guess will sound confident), or when a simpler tool would do the job (a regex, a SQL query, a saved search). For a fuller treatment, see When AI is the wrong tool.

The single most common mistake we see in early adopters: skipping the human-review step on things that ship to customers. The model will be right most of the time, which is exactly what makes the rare confident error so damaging — it slips through precisely because nine out of ten outputs are fine.

Common questions

FAQ

Is an LLM the same as AI?

No. "AI" is a broad umbrella covering many different kinds of systems — image classifiers, recommendation engines, robotics, voice systems, and large language models, among others. LLMs are one specific kind of AI focused on text. When current marketing says "AI," it almost always means an LLM in practice.

Do I need to pay to use one?

All three major vendors (OpenAI, Anthropic, Google) offer a free tier with daily limits. The free tiers are useful for trying things out and for low-volume personal use. The paid consumer tier (~$20/month) gives much higher limits and access to the best model. Pay only when you've established the tool is genuinely useful for the work — don't subscribe on enthusiasm.

Will the model learn from my data?

On consumer plans, all three major vendors (OpenAI, Anthropic, Google) now train on chat data by default — Anthropic flipped its Claude consumer default in August 2025, putting it in line with ChatGPT and Gemini. All three offer opt-out settings. API, team, and enterprise plans on all three vendors do not train on your data by contract. Vendor policies change; for the full picture, see AI privacy — what to watch for.

Can it read PDFs, images, or audio?

Yes — modern flagship models from all three vendors are multimodal, meaning they can take images, PDFs, and (in some cases) audio as input. Quality varies: text extraction from clean PDFs is excellent; reading complex tables or handwritten notes is hit-or-miss; analysing photos is good for general scenes but unreliable for fine detail. Always spot-check the extraction before relying on it.

Will the model remember previous conversations?

Inside one conversation, yes — up to its context window. Across conversations, it depends on the product: ChatGPT and Gemini have a "memory" feature that selectively persists facts across chats; Claude does not by default. None of these memories are reliable enough to use as a database. Treat each session as starting near-fresh and re-supply the context the model needs.

How is this different from a search engine?

A search engine returns a list of documents that probably contain your answer; an LLM gives you an answer directly, in your phrasing. The trade-off: search engines link you to sources you can verify; an LLM may produce confident text without traceable sources. Newer products combine the two — "AI search" tools like Perplexity, ChatGPT Search, and Google's AI Overviews use an LLM to summarise live search results, which is a meaningful improvement on either alone.

Sources & references

Change history (1 entry)
  • 2026-05-10 Initial publication.