How to parse an XML file to CSV using Golang with dynamic schema
Converting XML files to CSV is common in data processing and transformation. XML is a frequently used format for representing data, while CSV is a simpler format often used for data interchange.
The Go built-in package called encoding/xml can be used for parsing XML files. It allows us to map XML elements and attributes to Go struct fields. We’ll use the encoding/csv package in Go for writing CSV data.
Dynamic schema handling
One challenge when converting XML to CSV is handling varying XML structures. XML files from different sources might have different tags and structures. To overcome this, we’ll implement a dynamic schema approach where the schema (headers) for the CSV file is determined dynamically based on the XML content. This is done through the following two steps:
Understanding the layout of the XML file
Identifying the pattern and repetition required in the CSV file
Here, we deal with this type of an XML file:
<Entitys><Entity><name>Sara</name><age>30</age><adress>Adress of Sara</adress><adress2>Adress line 2 of Sara</adress2><email>Sara@gmail1.com</email></Entity><Entity><name>Smith Wasten</name><age>25</age><adress>Adress of Smith Wasten</adress><email>Smith Wasten@example.com</email></Entity><Entity><name>Michael Cornival</name><age>40</age><adress>Adress of Michael Cornival</adress><email>michael.Cornival@hotmail1.com</email></Entity></Entitys>
The steps to convert from XML to CSV
Reading the XML file
We will use the os package to open the XML file, as shown below:
xmlFile, err := os.Open("test.xml")if err != nil {fmt.Println("Error opening XML file:", err)return}defer xmlFile.Close()
Determining the dynamic schema
We will analyze the XML data to determine the headers for the CSV file dynamically. We will write structs that collect information from the XML file, a shown below:
type Node struct {XMLName xml.NameAttrs []xml.Attr `xml:"-"`Children []Node `xml:",any"`Text string `xml:",chardata"`}
XMLName: This field stores the XML tag name.Attrs: This field stores the attribute used in the XML tag.Childern: This field stores the child tag of the main tag recursively.Text: This field stores the text of the child tag.
These fields can vary and depend on our dataset in the XML file. However, the main purpose is to deal with XML data with some similar pattern.
Parsing the XML
We will use the encoding/xml package to parse the XML data into Go structs.
decoder := xml.NewDecoder(xmlFile)var rootNode Nodeerr = decoder.Decode(&rootNode)if err != nil {fmt.Println("Error decoding XML:", err)return}
Line 1: Create a decoder from the
xmlFileobject using theNewDecoderhandler.Line 3: Decode and store the XML data into the
rootNode.
Generating the CSV File
We will use the encoding/csv package to create and write a CSV file. We will write the determined headers first, then iterate through the parsed XML data again to extract values and write them into the corresponding CSV rows.
// Create the CSV file writercsvFile, err := os.Create("output/output.csv")if err != nil {fmt.Println("Error creating CSV file:", err)return}defer csvFile.Close()writer := csv.NewWriter(csvFile)defer writer.Flush()// Determine the header by iterating the rootNodevar header []stringfor _, node := range rootNode.Children {var csvHeader []stringfor _, child := range node.Children {csvHeader = append(csvHeader, child.XMLName.Local)}if len(csvHeader) > len(header) {header = csvHeader}}// Write the header in the CSV file firstwriter.Write(header)//Iterate the rootNode to write data against the headerfor _, node := range rootNode.Children {var csvData []stringj := 0for _, child := range node.Children {if child.XMLName.Local != header[j]{for child.XMLName.Local != header[j] {csvData = append(csvData, "")j = j + 1}}csvData = append(csvData, child.Text)j=j+1}writer.Write(csvData)}
Complete code
The playground below uses the test.xml file and converts it into the output.csv file. You can change the test.xml file to a similar layout and explore it.
Note: The
output.csvfile can be downloaded by clicking on the blue button which comes after pressing the “Run” button below.
package mainimport ("encoding/csv""encoding/xml""fmt""os")type Node struct {XMLName xml.NameAttrs []xml.Attr `xml:"-"`Children []Node `xml:",any"`Text string `xml:",chardata"`}func main() {// xmlFile, err := os.Open("test.xml")xmlFile, err := os.Open("test.xml")if err != nil {fmt.Println("Error opening XML file:", err)return}defer xmlFile.Close()decoder := xml.NewDecoder(xmlFile)var rootNode Nodeerr = decoder.Decode(&rootNode)if err != nil {fmt.Println("Error decoding XML:", err)return}csvFile, err := os.Create("output/output.csv")if err != nil {fmt.Println("Error creating CSV file:", err)return}defer csvFile.Close()writer := csv.NewWriter(csvFile)defer writer.Flush()var header []stringfor _, node := range rootNode.Children {var csvHeader []stringfor _, child := range node.Children {csvHeader = append(csvHeader, child.XMLName.Local)}if len(csvHeader) > len(header) {header = csvHeader}}writer.Write(header)for _, node := range rootNode.Children {var csvData []stringj := 0for _, child := range node.Children {if child.XMLName.Local != header[j]{for child.XMLName.Local != header[j] {csvData = append(csvData, "")j = j + 1}}csvData = append(csvData, child.Text)j=j+1}writer.Write(csvData)}}
Unlock your potential: Golang series, all in one place!
To continue your exploration of Golang, check out our series of Answers below:
What is the NewReplacer function in golang?
Learn how Go'sstrings.NewReplacer()efficiently replaces multiple substrings in a single pass, avoiding sequential replacements.Type Assertions and Type Switches in Golang
Learn how type assertions and type switches in Go enable dynamic type handling within interfaces, ensuring type safety and flexibility for robust and efficient programming.What is the fan-out/fan-in pattern in Golang
Learn how the fan-out/fan-in pattern in Go parallelizes tasks using goroutines and channels, enabling concurrent execution and efficient result aggregation.Getting Started with Golang Unit Testing
Learn how to perform unit testing in Go by creating_test.gofiles, using thetestingpackage, and writing clear test cases.How to parse xml file to csv using golang with dynamic shcema?
Learn how to use Go'sencoding/xmlandencoding/csvpackages to dynamically convert XML files to CSV.
Free Resources