Managing Character Encodings

Learn how to manage and implement character encodings in PHP.

We'll cover the following

楮⁃桡灴敲‱Ⱐ睥⁨慤ⁱ畩瑥⁡敮杴桹⁤楳捵獳楯渠慢潵琠䅓䍉䤠慮搠啔䘭㠮⁔桥獥⁡牥⁥硡浰汥猠潦⁣桡牡捴敲⁥湣潤楮杳Ⱐ扵琠瑨敲攠慲攠浡湹潲攮⁗桡琠桡灰敮敤⁡琠瑨攠扥杩湮楮朠潦⁴桩猠灡牡杲慰栠楳⁡渠數慭灬攠潦潪楢慫攠ⴠ瑨攠牥獵氍ਠ潦⁰牯捥獳楮—what went wrong there? What should have appeared there was that so far in this chapter, we have had quite a lengthy discussion about ASCII and UTF-8. These are examples of character encodings, but there are many more. What happened at the beginning of this paragraph is an example of mojibake—the result of processing text in a different character encoding than the one used to write it. The specific example here was UTF-8 encoded content interpreted as a sequence of bytes encoded using UTF-16 BE. While this particular example was purposely done and is benign, having this happen on accident can have consequences for application data flow, user experience, and breaking layouts, amongst other things. In extreme cases, this process can unintentionally change the meaning of a message entirely when read by the right reader, such as taking an ordinary conversation about character encodings and producing a disjointed equine conversation.

Get hands-on with 1200+ tech skills courses.