Implicit Conversions

Learn how Perl concatenates different types of strings using implicit conversion.

Unicode problems

Most Unicode problems in Perl arise because a string could be either a sequence of octets or a sequence of characters. Perl allows us to combine these types through the use of implicit conversions. When these conversions are wrong, they’re rarely obviously wrong, but they’re often spectacularly wrong in difficult ways to debug.

Concatenation

When Perl concatenates a sequence of octets with a sequence of Unicode characters, it implicitly decodes the octet sequence using the Latin-1 encoding. The resulting string will contain Unicode characters. When we print Unicode characters, Perl will encode the string using UTF-8, since Latin-1 can’t represent the entire set of Unicode characters—because Latin-1 is a subset of UTF-8.

Get hands-on with 1200+ tech skills courses.