Polars is a fast and efficient data manipulation library written in Rust. It’s designed to provide high-performance operations on large datasets and handles them more quickly than pandas
. It’s particularly suitable when working with tabular data.
The polars.from_pandas()
method is useful for converting pandas DataFrames into Polars DataFrames, offering significant performance improvements, especially with large datasets. It enables faster data manipulation through parallel processing, lower memory usage, and enhanced efficiency, making it ideal for data-intensive workflows like machine learning, ETL pipelines, and real-time analytics. Additionally, it provides an easy transition from pandas to Polars, allowing for seamless integration in memory-constrained environments or when optimizing existing pandas-based projects.
First, let’s import the polars
library.
import polars as pl
After importing the library, let’s examine in detail how polars.from_pandas()
works.
polars.from_pandas()
methodThe polars.from_pandas()
method converts pandas DataFrame/Series to the polars DataFrame/Series respectively.
polars.from_pandas( data, schema=None, nan_to_null = True, include_index = False)
The parameters are described below:
data
: It is represented as a pandas DataFrame, Series, or Index.
schema
: This is an optional parameter. If provided, it allows us to specify a schema for the resulting Polars DataFrame. If not provided (default is None
), Polars will attempt to infer the schema from the input pandas data.
nan_to_null
: This is also an optional parameter. The default value of the parameter is True
. If set to True
, it means that NaN values present in the input pandas data will be transformed into null
values in the resulting Polars DataFrame. If the value is set to False
, NaN values will be preserved in their original form.
include_index
: This is also an optional parameter, and its default value is False
. If set to True
, it indicates that the index information from the input pandas DataFrame or Series should be included in the resulting Polars DataFrame. If set to False
, the index information is not included.
The method returns a Polars DataFrame if data
is pandas DataFrame .
The method returns a Polars Series if data
is pandas Series or index.
import pandas as pdimport polars as pl# Creating a pandas DataFramepd_df = pd.DataFrame([[1, 2, 3], [0, 1, 2]], columns=["A", "B", "C"])# Printing the pandas DataFrameprint(pd_df)# Converting pandas DataFrame to a Polars DataFramedf = pl.from_pandas(pd_df)# Printing the Polars DataFrameprint(df)
Lines 1–2: We import the polars
and pandas
library as pl
and pd
respectively.
Line 5: We create the pandas DataFrame named pd_df
. The DataFrame is initialized with a 2 x 3 matrix (2 rows, 3 columns) containing numeric values.
Line 8: We print the pandas DataFrame.
Line 11: We use the from_pandas()
method to convert the previously created pandas DataFrame (pd_df
) to a Polars DataFrame (df
).
Line 14: We print the polars DataFrame.