Unicode and Strings
Explore how Perl manages Unicode characters and binary strings, including character encodings, filehandle Unicode handling, and proper encoding and decoding practices to write maintainable and reliable Perl code with diverse text data.
Unicode is a system used to represent the characters of the world’s written languages. Most English text uses a character set of only 127 characters (which requires 7 bits of storage and fits nicely into 8-bit bytes), but it’s naïve to believe that we won’t someday need an umlaut.
Perl strings
Perl strings can represent either of two separate but related data types:
Sequences of Unicode characters
Each character has a codepoint, a unique number that identifies it in the Unicode character set.
Sequences of octets
Binary data in a sequence of octets—8-bit ...