How to write a DataFrame to a Parquet file in Python

Overview

Apache Parquet is a column-oriented, open-source data file format for data storage and retrieval. It offers high-performance data compression and encoding schemes to handle large amounts of complex data.

We use the to_parquet() method in Python to write a DataFrame to a Parquet file.

Note: Refer to What is pandas in Python? to learn more about pandas.

Syntax

DataFrame.to_parquet(path=None, engine='auto', compression='snappy', index=None, partition_cols=None, storage_options=None, **kwargs)

Parameters

path: This is the path to the Parquet file.
engine: This parameter indicates which Parquet library to use. The available options are auto, pyarrow, and fastparquet.
compression: This parameter indicates the type of compression to use. The available options are snappy, gzip, and brotli. The default compression is snappy.
index: This is a boolean parameter. If True, the DataFrame’s indexes are written to the file. If False, the indexes are ignored.
partition_cols: These are the names of the columns that partition the DataFrame. The order in which the columns are given determines the order in which they are partitioned.
storage_options: These are the extra options for a certain storage connection, such as a host, port, username, password, and so on.

Example

Free Resources

License: Creative Commons-Attribution-ShareAlike 4.0 (CC-BY-SA 4.0)

How to write a DataFrame to a Parquet file in Python

Overview

Syntax

Parameters

Example

Explanation