What is the polars.from_pandas method in Polars?

Polars is a fast and efficient data manipulation library written in Rust. It’s designed to provide high-performance operations on large datasets and handles them more quickly than pandas. It’s particularly suitable when working with tabular data.

The polars.from_pandas() method is useful for converting pandas DataFrames into Polars DataFrames, offering significant performance improvements, especially with large datasets. It enables faster data manipulation through parallel processing, lower memory usage, and enhanced efficiency, making it ideal for data-intensive workflows like machine learning, ETL pipelines, and real-time analytics. Additionally, it provides an easy transition from pandas to Polars, allowing for seamless integration in memory-constrained environments or when optimizing existing pandas-based projects.

Import the library

First, let’s import the polars library.

import polars as pl
Importing the polars library

After importing the library, let’s examine in detail how polars.from_pandas() works.

The polars.from_pandas() method

The polars.from_pandas() method converts pandas DataFrame/Series to the polars DataFrame/Series respectively.

Syntax

polars.from_pandas( data, schema=None, nan_to_null = True, include_index = False)
Syntax of the from_pandas method

Parameters

The parameters are described below:

  • data: It is represented as a pandas DataFrame, Series, or Index.

  • schema: This is an optional parameter. If provided, it allows us to specify a schema for the resulting Polars DataFrame. If not provided (default is None), Polars will attempt to infer the schema from the input pandas data.

  • nan_to_null: This is also an optional parameter. The default value of the parameter is True. If set to True, it means that NaN values present in the input pandas data will be transformed into null values in the resulting Polars DataFrame. If the value is set to False, NaN values will be preserved in their original form.

  • include_index: This is also an optional parameter, and its default value is False. If set to True, it indicates that the index information from the input pandas DataFrame or Series should be included in the resulting Polars DataFrame. If set to False, the index information is not included.

Return value

  1. The method returns a Polars DataFrame if data is pandas DataFrame .

  2. The method returns a Polars Series if data is pandas Series or index.

Code

import pandas as pd
import polars as pl
# Creating a pandas DataFrame
pd_df = pd.DataFrame([[1, 2, 3], [0, 1, 2]], columns=["A", "B", "C"])
# Printing the pandas DataFrame
print(pd_df)
# Converting pandas DataFrame to a Polars DataFrame
df = pl.from_pandas(pd_df)
# Printing the Polars DataFrame
print(df)

Explanation

  • Lines 1–2: We import the polars  and pandas library as pl and pd respectively.

  • Line 5: We create the pandas DataFrame named pd_df. The DataFrame is initialized with a 2 x 3 matrix (2 rows, 3 columns) containing numeric values.

  • Line 8: We print the pandas DataFrame.

  • Line 11: We use the from_pandas() method to convert the previously created pandas DataFrame (pd_df) to a Polars DataFrame (df).

  • Line 14: We print the polars DataFrame.

Copyright ©2024 Educative, Inc. All rights reserved