Home/Newsletter/Artificial Intelligence/BLT (More Than A Sandwich 🥪): Meta’s New Byte Latent Transformer
Home/Newsletter/Artificial Intelligence/BLT (More Than A Sandwich 🥪): Meta’s New Byte Latent Transformer

BLT (More Than A Sandwich 🥪): Meta’s New Byte Latent Transformer

Could BLT finally spell the end of tokenization-based models? Let’s explore what BLT brings to the table.
7 min read
Mar 10, 2025
Share

LLMs struggle with something as simple as counting letters in “strawberry.” Sounds absurd, right? But this “strawberry problem” isn’t just a random quirk. It’s a fundamental flaw in how today’s models process text.

The culprit? Tokenization.

Tokenization — especially methods like byte pair encoding (BPE), which is used by most LLMs — introduces fragmentation that can distort how LLMs process text.

Instead of treating words as whole units, BPE splits them into subword pieces based on frequency in the training data. Some words, like “banana,” are common enough to remain a single token, while others, like “strawberry,” get broken into multiple tokens (which are often inconsistent). As a result, when an LLM tries to count letters within a word, it isn’t seeing the full word at once — it’s reasoning over fragmented pieces.

This inefficiency in tokenization has persisted for years, but Meta’s new Byte Latent Transformer (BLT) might change everything.


Written By: Fahim ul Haq