An AI agent is AI that can take actions in sequence, not just generate one response. Instead of producing a single answer, an agent decides what to do next, calls a tool or an API (application programming interface — the way one piece of software calls another), reads the result, and chooses the next step. That’s the core idea. Everything else is a matter of how much the system actually decides on its own.
The label has been stretched. Vendors now apply “agent” to anything from a chatbot with one tool call, to a multi-step workflow runner, to a genuinely autonomous research system. When someone says “we built an AI agent for X,” the listener has no clear idea what was actually built. The marketing-versus-reality gap is the largest of any current AI category.
This piece is the honest taxonomy: the five rungs of what “agent” actually covers, where capability genuinely sits in 2026, and the questions that cut through vendor claims.
The spectrum of what 'agent' covers
The label applies across a wide capability range:
-
Tool-using chatbots. A chatbot that can call one or two external functions (look up an order, schedule a meeting) is increasingly called an “agent.” This is the lightest version — a slight extension of conversational AI, not autonomous behaviour.
-
Multi-step workflow runners. A system that takes a goal (“send a follow-up email to the prospect after the call”) and runs through a defined sequence of steps. The sequence is mostly pre-determined; the AI fills in the language and small decisions. Tools like Zapier’s AI features, Make’s agent modules, and many “AI workflow” products fit here.
-
Tool-orchestrating agents. Systems that decide which tools to call and in what order based on the situation. Given a goal, they plan, call APIs, observe results, adapt. ReAct-style agents in LangChain, OpenAI’s function-calling-loop patterns, Claude with tool use. This is where the term most rigorously applies; the capability is real and increasingly production-ready for well-defined tasks.
-
Autonomous multi-step research / problem-solvers. Systems that handle open-ended goals — “research this company and produce a brief,” “investigate this customer issue and recommend an action.” These chain together many tool calls, manage long contexts, and self-correct. Products like Manus and the agent capabilities in Claude Sonnet 4.5+/Opus 4.7 are here. The capability is real but still bounded; multi-hour autonomous runs work sometimes and fail in interesting ways.
-
Truly autonomous software systems. Software that operates without human intervention on production work over long horizons. This is mostly future tense — demos exist, production deployments are narrow.
The vendor’s “agent” can be any of these. The customer’s expectation is usually the third or fourth; the product is usually the first or second. Hence the disappointment cycle.
What works today, what doesn't
Works reliably: Tool-using chatbots, multi-step workflows with clear paths, tool-orchestrating agents on well-defined tasks (single-domain customer service, structured data extraction, defined research questions). These are production-deployed at scale across industries.
Works with care: Autonomous research and problem-solving on open-ended tasks within defined boundaries. The agent can produce useful work but needs human oversight on outputs; long-horizon runs are not reliably correct without intervention.
Mostly doesn’t work yet: Truly autonomous software handling consequential decisions over weeks. Demos exist; production at scale doesn’t. The agent layer can plan and execute but the failure modes are still substantial enough that consequential autonomy is rare in 2026.
The questions that cut through marketing
When a vendor pitches an “agent” product, ask:
-
What does the agent actually decide? If the answer is “fills in language for a pre-defined workflow,” it’s a workflow runner with AI augmentation. If the answer is “picks tools and chains them based on the situation,” it’s a real orchestration agent. Both can be useful; understand which you’re buying.
-
What’s the failure rate on a real task? Don’t accept demo videos. Ask for the agent’s success rate on a representative task set; ask what failure modes look like. Honest vendors will share this; pitch-heavy vendors will deflect.
-
What’s the human-in-the-loop pattern? Most production agents need oversight on outputs. Where in the workflow does the human check? If “nowhere” is the answer, either the use case is trivial or the system is over-claiming.
-
How does the agent handle the long tail? The 80% of common cases work in demos; the 20% of edge cases are where production agents struggle. Ask specifically about the edge cases in your domain.
-
What’s the cost per run? Multi-step agent runs consume many LLM calls. Cost per task can be substantial at production volume. Get the realistic number, not the demo number.
What agent deployments actually look like at scale
The “agent” framing is real in some categories and mostly aspirational in others. The bounded-capability framing — “this works for X, doesn’t yet work for Y” — beats both the hype and the dismissal.
Related work
For the specific case of AI agents in customer-facing roles, see AI agents for inbound qualification. For the live-chat AI framework, see Live-chat AI: when it works and when it actively hurts trust. For the broader hallucination risk in agentic systems, see AI hallucinations explained. For the open-source-vs-proprietary lens, see Open-source vs proprietary AI — practical tradeoffs.
FAQ
Should we build our own agent or buy a vendor product?
Buy when your task matches a vendor's domain and the cost is reasonable; build when your workflow is bespoke enough that no vendor matches. Common pattern: buy for the standard tasks (customer support, lead qualification, meeting scheduling), build for your unique competitive workflows. Building is meaningfully more work than the demos suggest; budget conservatively.
Are AI agents replacing jobs?
Some, with caveats. The agents work well in narrow, well-defined task categories — those jobs see automation pressure. Open-ended, judgement-heavy, relationship-driven, high-stakes work is mostly safe through 2026 and beyond. The job-impact picture is uneven; honest scenarios involve transformation more than wholesale replacement.
How do I distinguish a real agent from a marketing-labelled chatbot?
Three questions. (1) Does it call tools? Real agents interact with systems. (2) Does it plan? Real agents decide multi-step sequences based on the situation. (3) Can it recover from failure? Real agents observe results and adapt; chatbots execute regardless. If a product fails one of these tests, it's a different category than what the agent label suggests.
What's the realistic horizon for autonomous agents handling consequential work?
Categories vary. Narrow, well-defined consequential work (specific medical decisions with strong oversight, structured trading within bounds, defined manufacturing-control tasks) is partially happening now. Open-ended consequential work (running a business autonomously, complex legal decisions) remains future-tense. The arc points toward more autonomy in defined domains; sweeping general-purpose autonomy is further away than vendor pitches suggest.