Parquet: Reading & Writing
Explore how Parquet handles reading and writing of nested data structures using definition and repetition levels. Understand how Parquet achieves compression and interoperability with schemas like Avro, enhancing your ability to work with Big Data input and output formats.
We'll cover the following...
We'll cover the following...
Reading & Writing
Looking back at our schema, we can come up with maximum definition and repetition levels for the various fields in our record.
message Car {
required string make;
required int year;
repeated group part {
required string name;
optional int life;
repeated string oem;
}
}
The table below explains the rationale behind the maximum values for definition and repetition levels:
Now, use the definition and repetition levels to write and read back records. Let’s consider the records below:
{
make : "Rolls Royce",
year : "2025",
part :{
name: "Tyre",
life: "2",
oem: "Bridgestone",
oem: "Michelin"
}
part : {
name: "Touch Screen",
life: "5"
...