Glasswork Analytics
All insights
May 2026AI Fundamentals9 min read

Three concepts that will save you from AI snake oil: context windows, hallucinations and agents

Most of the questionable AI advice doing the rounds in 2026 falls apart the moment you understand three concepts: the context window, why models hallucinate, and what an AI agent actually is. Get these straight and you can call the difference between a real builder and a confident pitch deck inside ten minutes.

We use this exact filter on every supplier conversation we sit in on. It is not a research paper. It is a survival kit for owners who have to make spend decisions next quarter without a CTO in the room.

1. The context window: a model's working memory

A large language model does not “remember” you between conversations. Each time it generates a reply it reads everything in front of it - the system prompt, the chat history, any documents you attached - in one go and predicts the next word. That whole pile is the context window.

The window is measured in tokens, not characters. A token is roughly three quarters of an English word. “The plumber arrived” is about 4 tokens. A typical contract is around 5,000.

System prompt
rules + persona
Chat history
every turn so far
Your message
+ any attached docs
read in one pass
Reply
next-token guess

Modern frontier models have huge windows by historic standards - Anthropic's Claude family runs at 200,000 tokens, Google Gemini at 1-2 million, the open Llama 4 family at 1 million. That sounds infinite. It is not.

Why size isn't everything

  • Recall degrades with depth.The industry-standard “needle in a haystack” benchmark shows that even leading models miss specific facts buried in the middle of a million-token document. Front and back are usually fine. The middle leaks.
  • You pay for every token. Pricing is per million input tokens. A 500-page PDF chucked into a chat is roughly 200,000 input tokens - per request. Do that ten times a day and the bill compounds.
  • Latency scales with input. Bigger contexts mean slower replies. Real-time chat with a 1M-token context is not an experience anyone enjoys.

2. Hallucinations: not a bug, a property

A hallucination is when a model confidently states something false. It made up a citation, invented a phone number, named the wrong director of a company. The temptation is to call it a bug and wait for a fix. That misreads what the model is doing.

A language model is not a database. It is a probability machine that produces the most plausible-looking next token given everything before it. “Plausible-looking” and “true” are correlated, not identical.

Hallucinations are baked in. They cannot be eliminated, only reduced - and the techniques for reducing them are well understood. If a vendor tells you their model “does not hallucinate”, you are being sold something.

What actually reduces hallucinations

Retrieval (RAG)
Pull the answer from your own documents and pass it to the model as context. Force the model to quote. Biggest single lever.
Tool use
Let the model call a calculator, a database, an API. Maths and lookups stop being guessed.
Structured output
Constrain replies to a JSON schema you define. The model still has to make things up - but it can't invent fields.
Citations
Demand the model attach a source for every claim. Forces grounding and makes review fast.

3. Agents: a working definition for non-engineers

“AI agent” is the most stretched term in software right now. The clean definition we work to:

An agent is a language model that, given a goal, decides which tools to call, calls them, reads the results, and decides what to do next - in a loop, until it judges the goal met.

The loop is the whole game:

Goal
from you
Plan
LLM decides next step
Act
call a tool
Observe
read the result
repeat
Done?
model judges

A chatbot that just talks back at you is not an agent. A workflow with if-this-then-that rules is not an agent. An assistant that can read your inbox, draft a quote, check the calendar, and book a follow-up - choosing each step itself - is an agent.

Where agents work, and where they don't

Good fitBad fitWhy
Triaging support ticketsApproving refunds without reviewReversible vs irreversible
Drafting weekly reportsSending paymentsCost of being wrong
Researching prospectsNegotiating contractsTribal knowledge can't be retrieved
Code refactors with testsProduction deploys without reviewTests are the safety net

The bigger picture

3
concepts
to filter 90% of AI sales decks
200K
Claude context
tokens, ~150K words
0%
hallucination floor
reducible, never zero

None of this is a brake on using AI. It is a brake on buying it badly. The teams that get value from AI in 2026 share three habits: they retrieve their own data instead of stuffing it into prompts, they treat every model output as a draft to be checked, and they put humans in front of any irreversible action an agent can take.

That is the whole playbook. Most of the rest is decoration.

Want to take this further?

Insights · One email a month

Useful things, when there are useful things to say.

Plain-English notes on AI, automation, and bespoke software for UK SMEs. We don’t do drip campaigns. Unsubscribe in one click.

We only ask for your email if you’ve opted in to marketing cookies. That’s how we keep things tidy - one place to change your mind, any time.