Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

communitycreator
dataframe
python
pandas
apache parquet

How to write a DataFrame to a Parquet file in Python

abhilash

Overview

Apache Parquet is a column-oriented, open-source data file format for data storage and retrieval. It offers high-performance data compression and encoding schemes to handle large amounts of complex data.

We use the to_parquet() method in Python to write a DataFrame to a Parquet file.

Note: Refer to What is pandas in Python? to learn more about pandas.

Syntax

DataFrame.to_parquet(path=None, engine='auto', compression='snappy', index=None, partition_cols=None, storage_options=None, **kwargs)

Parameters

  • path: This is the path to the Parquet file.
  • engine: This parameter indicates which Parquet library to use. The available options are auto, pyarrow, and fastparquet.
  • compression: This parameter indicates the type of compression to use. The available options are snappy, gzip, and brotli. The default compression is snappy.
  • index: This is a boolean parameter. If True, the DataFrame’s indexes are written to the file. If False, the indexes are ignored.
  • partition_cols: These are the names of the columns that partition the DataFrame. The order in which the columns are given determines the order in which they are partitioned.
  • storage_options: These are the extra options for a certain storage connection, such as a host, port, username, password, and so on.

Example

import pandas as pd
import os

data = [['dom', 10], ['abhi', 15], ['celeste', 14]]

df = pd.DataFrame(data, columns = ['Name', 'Age'])

df.to_parquet("dataframe.parquet")

print("Listing the contents of the current directory:")
print(os.listdir('.'))
Generating the parquet file using to_parquet()

Explanation

  • Lines 1–2: We import the pandas and os packages.
  • Line 4: We define the data for constructing the pandas dataframe.
  • Line 6: We convert data to a pandas DataFrame called df.
  • Line 8: We write df to a Parquet file using the to_parquet() function. The resulting file name as dataframe.parquet.
  • Lines 10–11: We list the items in the current directory using the os.listdir method. We observe that the dataframe.parquet file is created.

RELATED TAGS

communitycreator
dataframe
python
pandas
apache parquet
RELATED COURSES

View all Courses

Keep Exploring