CSV Parsing: The CSV Parser
Explore how to implement a CSV parser focusing on encoding and decoding processes in Erlang. Understand handling tricky cases and ambiguities in the CSV format using property-based testing with PropEr. Learn to write example tests alongside properties for reliable and robust CSV parsing.
We'll cover the following...
The CSV parser
We can now move on to implementing a CSV parser. Here is a possible implementation:
Note: Decoding is done by fetching the headers, then fetching all of the rows. A header line is parsed by reading each column name one at a time, and a row is parsed by reading each field one at a time.
First, there’s the public interface with two functions:
encode/1decode/1.
The functions are fairly straightforward, delegating the more complex operations to private helper functions. Let’s start by looking at those helping with encoding:
If a string is judged to need escaping (according to escapable/1), then the string is wrapped in double quotes (") and all double quotes inside of it are escaped with another double quote. With this, encoding is covered. Next, there are decoding’s private functions:
Decoding is done by fetching the headers, then fetching all of the rows. A header line is parsed by reading each column name one at a time, and a row is parsed by reading each field one at a time. At the end we can see that both fields and names are actually implemented as quoted or unquoted strings:
Both functions that read quoted or unquoted strings work ...