Double-quoted Strings Are Binaries
Explore the nature of double-quoted strings in Elixir as UTF-8 encoded binaries. Understand their memory implications, how to work with string functions like length, codepoints, and pattern matching, and practice key String module operations for efficient string handling.
We'll cover the following...
Introduction
Unlike single-quoted strings, the contents of a double-quoted string (dqs) are stored as a consecutive sequence of bytes in UTF-8 encoding. Clearly, this is more efficient in terms of memory and certain forms of access, but it does have two implications.
- First, because UTF-8 characters can take more than a single byte to represent, the size of the binary is not necessarily the length of the string.
iex> dqs = "∂x/∂y" "∂x/∂y" iex> String.length dqs 5 iex> byte_size dqs 9 iex> String.at(dqs, 0) "∂" iex> String.codepoints(dqs) ["∂", "x", "/", "∂", "y"] iex> String.split(dqs, "/") ["∂x", "∂y"] - Second, because we’re no longer using lists, we need to learn and work with the binary syntax alongside the list syntax in your code.
Strings and Elixir libraries
When Elixir library documentation uses the word “string” (and most of the time it uses the word “binary”), it means double-quoted strings. The String module defines functions that work with double-quoted strings. Let’s cover some of them below with examples.
-
at(str, offset)returns the grapheme at the given offset (starting at0). Negative offsets count from the end of the string.iex> String.at("∂og", 0) "∂" iex> String.at("∂og", -1) "g" -
capitalize(str)convertsstrto lowercase, and then capitalizes the first character.iex> String.capitalize "école" "École"