Implicit Conversions

Explore the impact of implicit conversions on Unicode and octet sequences in Perl. Understand how different string encodings interact during concatenation and the common pitfalls that arise. Learn the importance of decoding input and encoding output properly to avoid subtle bugs in handling Unicode data.

We'll cover the following...

Unicode problems
Concatenation
An example to illustrate the point

Unicode problems

Most Unicode problems in Perl arise because a string could be either a sequence of octets or a sequence of characters. Perl allows us to combine these types through the use of implicit conversions. When these conversions are wrong, they’re rarely obviously wrong, but they’re often spectacularly wrong in difficult ways to debug.

Concatenation

When Perl concatenates a sequence of octets with a sequence of Unicode characters, it implicitly decodes the octet sequence using the Latin-1 encoding. The resulting string will contain Unicode characters. When we print Unicode characters, Perl will encode the string using UTF-8, since Latin-1 can’t represent the entire set of Unicode characters—because Latin-1 is a subset of UTF-8.

1.Introduction to Modern Perl

2.The Perl Philosophy

3.Perl Identifiers and Variables

4.Control Flow: Conditionals and Looping

5.Data Structures

6.Packages and References

7.Operators

8.Functions

9.Regular Expressions and Matching

10.Objects

11.Style and Efficacy

12.Managing Real Programs

13.Perl Beyond Syntax

14.What to Avoid

15.Perl and Its Community

16.Next Steps with Perl

17.Appendix

Implicit Conversions

Unicode problems

Concatenation