What is the DataFrame.with_columns function in Polars?
The with_columns function introduced in Polars provides a convenient way to add new columns to a DataFrame without creating an entirely new copy of the existing data. It’s part of the DataFrame API designed to efficiently manipulate and transform tabular data. This function is useful for extending the functionality of our DataFrame by adding calculated or derived columns.
Syntax of with_columns
The with_columns function is defined as follows:
DataFrame.with_columns(*exprs: IntoExpr | Iterable[IntoExpr], **named_exprs: IntoExpr)
exprs: Theexprsparameter represents the columns to be added, which are specified as positional arguments. It accepts expression input, where strings are interpreted as column names, and other non-expression inputs are interpreted as literals.named_exprs: Thenamed_exprsparameter represents additional columns to be added, specified as keyword arguments. These columns will be renamed according to the keywords provided.
The function returns a new DataFrame with the specified columns added.
Codes to demonstrate with_columns
First, let’s look at a simple example of using with_columns function:
import polars as pl# Creating DataFramedata = pl.DataFrame({"alpha": [4, 6, 8, 10],"beta": [5, 4.8, 10.2, 20],"gamma": [True, False, False, True],})# Using .with_columns to add a new columnnew_dataFrame = data.with_columns((pl.col("alpha") ** 3).alias("alpha^3"))# Printing the valuesprint (new_dataFrame)
Code explanation
Let’s discuss the code step-by-step:
Lines 3–9: We create a
DataFramenameddatawith three columns named asalpha,beta, andgamma. Thealphacolumn contains integer values, thebetacolumn contains float values, and thegammmacolumn contains boolean values.Line 11: We add a new column, which will calculate the cube of the column
alpha, usingwith_columnfunction.Line 14: We print the
DataFramewith the added new column.
Second, let’s take a look at another complex code that add multiple columns using .with_columns.
import polars as pl# Creating DataFramedata = pl.DataFrame({"alpha":[4, 6, 8, 10],"beta": [5, 4.8, 10.2, 20],"gamma":[True, False, False, True],})# Adding multiple columnsnew_dataFrame = data.with_columns([(pl.col("alpha") ** 3).alias("alpha^3"),(pl.col("beta") * 3).alias("beta*3"),(pl.col("gamma").not_()).alias("not gamma"),])# Printing the valuesprint (new_dataFrame)
Code explanation
Let's discuss the code step by step:
Lines 3–9: We create a
DataFramenameddatawith three columns named asalpha,beta, andgamma. Thealphacolumn contains integer values, thebetacolumn contains float values, and thegammacolumn contains boolean values.Lines 11–17: We add three new columns, which will calculate the cube of the column
alpha, multiplication of columnbeta, and not of columngamma. Then, assigning it to variablenew_dataFrame.Line 14: We print the
DataFramewith the added columns.
At the end, let’s explore expressions with multiple outputs. These can be automatically transformed into Structs by enabling the setting Config.set_auto_structify(True):
import polars as pl# Creating DataFramedata = pl.DataFrame({"alpha": [4, 6, 8, 10],"beta": [5, 4.8, 10.2, 20],"gamma": [True, False, False, True],})with pl.Config(auto_structify=True):new_dataFrame = data.drop("gamma").with_columns(diffs=pl.col(["alpha", "beta"]).diff().name.suffix("_diff"),)# Printing the valuesprint (new_dataFrame)
Explanation
Let’s discuss the code step-by-step:
Lines 3–9: We create a
DataFramenameddatawith three columns named asalpha,beta, andgamma. Thealphacolumn contains integer values, thebetacolumn contains float values, and thegammacolumn contains boolean values.Lines 11–14: We use a
withblock to set a Polars configuration option (auto_structify) toTrue. Inside the block, a new DataFrame namednew_dataFrameis created by first dropping the columncand then adding a new column nameddiffs. Thediffscolumn is created by taking the differences between the values in columnsalphaandbeta, and the column names are suffixed with_diff.Line 14: We print the
DataFramewith the added new column.
Conclusion
The with_columns function in Polars is a powerful tool for extending the functionality of DataFrames by adding new columns. It provides a flexible and concise syntax for expressing column additions, and it's particularly useful when we want to enrich the data with calculated or derived values. Remember that employing this approach doesn’t generate a duplicate of the current data, making it a streamlined method to improve the DataFrame.
Free Resources