Parquet: Repetition Level

This lesson talks about Repetition Levels in Parquet.

We'll cover the following

Repetition Level

Now, we’ll dive into repetition levels. Repetition levels help Parquet compactly store data in columnar format, even for nested data-structures. We’ll reprint the car schema below for easy reference:

message Car {
 required string make;
 required int year;
 repeated group part {
     required string name;
     optional int life;
     repeated string oem;   

The group part in the above schema can repeat and holds information about a particular part that makes up the car. It includes the name of the part, its expected life in years and a list of original equipment manufacturers (OEM) that manufacture that part for the car company. When components can be repeated, we want to know when new lists are starting in a column of values. In other words, the repetition level is a marker of when to start a new list, and at which level. The associated repetition level value for a field is the level at which to create a new list for the current value. Consider the record below:

  make        : "Rolls Royce",
  year        : "2025",
  part        : {
                   name: "Touch Screen",
                   life: "5",
                   oem: "Samsung"
                   oem: "Sony", 
                   oem:  "LG"
  part        :{
                   name: "Tyre",
                   life: "2",
                   oem: "Bridgestone", 
                   oem: "Michelin"

The record has two parts. The first part has three OEMs listed. The second has two. The whole record, the first part record and the first OEM in the first part record, all have a repetition level of 0. Note that we have three levels of nesting: level 0 is the root of the tree or the Car record, level 1 for part records and level 2 for values within the OEM list. An explanation of how repetition levels are assigned to the record and its constituent fields appears below:

Get hands-on with 1200+ tech skills courses.