Exercise: Counting Unicode Characters
Understand how to enhance a Go command-line application by expanding its count function to accurately count Unicode characters using the Rune data type. Learn to handle input text that includes multibyte characters beyond ASCII, ensuring your word counter works correctly with diverse languages. This exercise teaches practical techniques for processing and counting characters in Go, improving your command-line tool's functionality and reliability.
We'll cover the following...
Challenge
In this exercise, your challenge is to expand the count function to count Unicode characters.
Problem statement
In addition to counting lines and words, your tool can also count the number of Unicode characters provided with the input.
Computers encode text using different standards. Text encoded in ASCII uses one byte per character. Therefore, to count characters for text encoded with this standard, it’s usually enough to count the number of bytes. This is useful for languages that support this encoding, such as English, but it might not be enough to correctly count the number of characters for languages encoded using Unicode standard, such as Japanese, because it might use more than one byte per character.
Go supports the Rune data type to represent Unicode characters (or code points). Expand the program to count runes in addition to words and lines.
Coding challenge
Take some time to figure out the smartest way to solve this problem. Start from the implementation of the count function at the end of this chapter. Expand the count function to receive a new boolean parameter named countRunes. If this parameter is set to true, the function should return the number of runes in the provided input text.
If you feel stuck, refer to Go’s documentation for runes or for the bufio package. If you still need help, check the solution review in the next lesson. Good luck!
Note: If you’re looking for an extra challenge, write a function for counting runes in addition to Unicode characters.
package main
import (
"bufio"
"flag"
"fmt"
"io"
"os"
)
func main() {
// Defining a boolean flag -l/-r to count lines/runes instead of words
countLines := flag.Bool("l", false, "Count lines")
countRunes := flag.Bool("r", false, "Count runes")
// Parsing the flags provided by the user
flag.Parse()
// Calling the count function to count the number of words or runes or lines
if !*countLines && !*countRunes {
fmt.Println("The content has", count(os.Stdin, *countLines, *countRunes), "number of words.")
} else if *countRunes == true {
fmt.Println("The content has", count(os.Stdin, *countLines, *countRunes), "number of runes.")
} else {
fmt.Println("The content has", count(os.Stdin, *countLines, *countRunes), "number of lines.")
}
}
func count(r io.Reader, countLines, countRunes bool) int {
// Write your code here
return wc
}