LLMs struggle with something as simple as counting letters in “strawberry.” Sounds absurd, right? But this “strawberry problem” isn’t just a random quirk. It’s a fundamental flaw in how today’s models process text.
The culprit? Tokenization.
Tokenization — especially methods like byte pair encoding (BPE), which is used by most LLMs — introduces fragmentation that can distort how LLMs process text.
Instead of treating words as whole units, BPE splits them into subword pieces based on frequency in the training data. Some words, like “banana,” are common enough to remain a single token, while others, like “strawberry,” get broken into multiple tokens (which are often inconsistent). As a result, when an LLM tries to count letters within a word, it isn’t seeing the full word at once — it’s reasoning over fragmented pieces.
This inefficiency in tokenization has persisted for years, but Meta’s new Byte Latent Transformer (BLT) might change everything.