Puzzle 5: Explanation

Explore how Rust represents strings internally as UTF-8 encoded byte vectors. Understand the distinction between bytes and Unicode characters, learn how to count and access characters safely using iterators, and discover the memory implications of handling Unicode text in Rust.

We'll cover the following...

Test it out
Explanation

String length
Count the characters

Impact of UTF-8 sizing

Explanation

As the compiler above says, “Halló heimur” contains $13$ characters (including the space). Let’s step back and look at how Rust’s String type works.

The definition of an internal struct of a String is quite straightforward.

pub struct String { 
    vec: Vec<u8>,
}

Strings are just a vector of bytes (u8) that represent Unicode characters in an encoding called UTF-8. Rust automatically translates our strings to UTF-8.

The illustration below shows us what the encoding looks like:

1.Introduction

2.Puzzles

3.Wrapping it up

Puzzle 5: Explanation

Test it out

Explanation