As long as things go well, nobody thinks about pretrained tokenizers. It’s like in real life. We can drive a car for years without thinking about the engine. Then, one day, our car breaks down, and we try to find the reasons to explain the situation.

The same happens with pretrained tokenizers. Sometimes the results are not what we expect. For example, some word pairs just don’t fit together, as we can see in the figure below:

Get hands-on with 1200+ tech skills courses.