May 2026AI Fundamentals9 min read

Three concepts that will save you from AI snake oil: context windows, hallucinations and agents

Most of the questionable AI advice doing the rounds in 2026 falls apart the moment you understand three concepts: the context window, why models hallucinate, and what an AI agent actually is. Get these straight and you can call the difference between a real builder and a confident pitch deck inside ten minutes.

We use this exact filter on every supplier conversation we sit in on. It is not a research paper. It is a survival kit for owners who have to make spend decisions next quarter without a CTO in the room.

1. The context window: a model's working memory

A large language model does not “remember” you between conversations. Each time it generates a reply it reads everything in front of it - the system prompt, the chat history, any documents you attached - in one go and predicts the next word. That whole pile is the context window.

The window is measured in tokens, not characters. A token is roughly three quarters of an English word. “The plumber arrived” is about 4 tokens. A typical contract is around 5,000.

System prompt

rules + persona

Chat history

every turn so far

Your message

+ any attached docs

read in one pass

next-token guess

Modern frontier models have huge windows by historic standards - Anthropic's Claude family runs at 200,000 tokens, Google Gemini at 1-2 million, the open Llama 4 family at 1 million. That sounds infinite. It is not.

Why size isn't everything

Recall degrades with depth.The industry-standard “needle in a haystack” benchmark shows that even leading models miss specific facts buried in the middle of a million-token document. Front and back are usually fine. The middle leaks.
You pay for every token. Pricing is per million input tokens. A 500-page PDF chucked into a chat is roughly 200,000 input tokens - per request. Do that ten times a day and the bill compounds.
Latency scales with input. Bigger contexts mean slower replies. Real-time chat with a 1M-token context is not an experience anyone enjoys.

2. Hallucinations: not a bug, a property

A hallucination is when a model confidently states something false. It made up a citation, invented a phone number, named the wrong director of a company. The temptation is to call it a bug and wait for a fix. That misreads what the model is doing.

A language model is not a database. It is a probability machine that produces the most plausible-looking next token given everything before it. “Plausible-looking” and “true” are correlated, not identical.

Hallucinations are baked in. They cannot be eliminated, only reduced - and the techniques for reducing them are well understood. If a vendor tells you their model “does not hallucinate”, you are being sold something.

What actually reduces hallucinations

Retrieval (RAG)

Pull the answer from your own documents and pass it to the model as context. Force the model to quote. Biggest single lever.

Tool use

Let the model call a calculator, a database, an API. Maths and lookups stop being guessed.

Structured output

Constrain replies to a JSON schema you define. The model still has to make things up - but it can't invent fields.

Citations

Demand the model attach a source for every claim. Forces grounding and makes review fast.

3. Agents: a working definition for non-engineers

“AI agent” is the most stretched term in software right now. The clean definition we work to:

An agent is a language model that, given a goal, decides which tools to call, calls them, reads the results, and decides what to do next - in a loop, until it judges the goal met.

The loop is the whole game:

Goal

from you

Plan

LLM decides next step

Act

call a tool

Observe

read the result

repeat

Done?

model judges

A chatbot that just talks back at you is not an agent. A workflow with if-this-then-that rules is not an agent. An assistant that can read your inbox, draft a quote, check the calendar, and book a follow-up - choosing each step itself - is an agent.

Where agents work, and where they don't

Good fit	Bad fit	Why
Triaging support tickets	Approving refunds without review	Reversible vs irreversible
Drafting weekly reports	Sending payments	Cost of being wrong
Researching prospects	Negotiating contracts	Tribal knowledge can't be retrieved
Code refactors with tests	Production deploys without review	Tests are the safety net

The bigger picture

concepts

to filter 90% of AI sales decks

200K

Claude context

tokens, ~150K words

hallucination floor

reducible, never zero

None of this is a brake on using AI. It is a brake on buying it badly. The teams that get value from AI in 2026 share three habits: they retrieve their own data instead of stuffing it into prompts, they treat every model output as a draft to be checked, and they put humans in front of any irreversible action an agent can take.

That is the whole playbook. Most of the rest is decoration.

Want to take this further?

What is an AI agent? - 5-minute primer - illustrated, with archetypes and a worked example.
The 60-minute AI readiness self-assessment - 30 questions to score where your team really sits.
Prompt patterns: 8 reusable shapes that work - the prompts we reach for, with examples.