Iterating Multibyte Strings
Explore techniques to iterate multibyte UTF-8 strings correctly in PHP using Laravel. Understand issues with standard byte iteration, memory implications of common methods, and how to implement custom UTF-8 string length and iteration functions for efficient string manipulation.
We'll cover the following...
UTF-8 string iterator implementation contains the full implementation for the class we will develop throughout this chapter.
Issues with multibyte iteration
In the previous lesson, we implemented several string functions that worked by iterating a string’s characters. Treating each byte as a single character is fine when working with ASCII, but as soon as we need to work with multibyte characters, things become a little more complicated. As an example, if we attempted to iterate “這可以” using the following. After pressing the “Run” button, the output can be viewed by clicking the app link under the “Run” button
php:
preset: laravel
version: 8
disabled:
- no_unused_imports
finder:
not-name:
- index.php
- server.php
js:
finder:
not-name:
- webpack.mix.js
css: true
Our output string would appear corrupted:
0: Ú1: Ç2: Ö3: Õ4: ...