Parquet: Repetition Level
Explore how repetition levels in Parquet format help efficiently store nested data by indicating when new lists start at various levels. Understand these concepts through practical examples using car parts and nested lists, providing a foundation to handle complex data with Parquet.
We'll cover the following...
Repetition Level
Now, we’ll dive into repetition levels. Repetition levels help Parquet compactly store data in columnar format, even for nested data-structures. We’ll reprint the car schema below for easy reference:
message Car {
required string make;
required int year;
repeated group part {
required string name;
optional int life;
repeated string oem;
}
}
The group part in the above schema can repeat and holds information about a particular part that makes up the car. It includes the name of the part, its expected life in years and a list of original equipment manufacturers (OEM) that manufacture that part for the car company. When components can be repeated, we want to ...