How Models Actually Work

Explore how AI language models process input through tokens, manage information within a context window, and operate in different model tiers. Understand the causes of AI errors like hallucinations and forgetting, and learn how system prompts and extended thinking influence AI behavior to better control your app development process.

We'll cover the following...

What are tokens?
What is the context window?
What is the difference between model tiers?
What is extended thinking?
What are system prompts?
How does all of this connect to hallucination?
Which model should you use when?
What comes next?

PawPals has been going well. At this stage, we are comfortable with the iteration loop, building real features across two different tools. Then three things happen in the same afternoon.

First, you ask the AI to update the booking confirmation page, and it rewrites the navigation bar you told it never to touch. You already said “do not change the navigation bar” four messages ago. The AI seems to have forgotten.
Second, you ask the AI to add a payment integration and it confidently references a Stripe function called createBookingCharge. That function does not exist, which you know because you looked it up, but the AI wrote it with the same confidence it writes everything else.
Third, you switch to a different model because someone told you it was “better.” The response takes 45 seconds instead of 3, and the answer is no better than what you were getting before. You just waited fifteen times longer for the same result.

None of these are bugs, and none of them are random. They are predictable behaviors of how language models work, and once you understand the mechanics, you can work around every one. Six concepts explain almost everything you will encounter: tokens, the context window, model tiers, extended thinking, temperature, and system prompts.

What are tokens?

When you type a message to the AI, it does not read your words the way you do. It breaks your text into tokens which are small chunks of text that the model processes as individual units.

A token is roughly three-quarters of a word:

Short common words like “the” or “is” → one token each
Longer words get split: “confirmation” → confirm + ation
Code is tokenized differently: const walkerName = "Sarah" → five or six tokens

Why does this matter? Because tokens are the currency of every interaction. ...

1.Thinking Before Building

2.From Prototype to Working Product

3.Connecting to the Real World

How Models Actually Work

What are tokens?