What is the DataFrame.with_columns function in Polars?

The with_columns function introduced in Polars provides a convenient way to add new columns to a DataFrame without creating an entirely new copy of the existing data. It’s part of the DataFrame API designed to efficiently manipulate and transform tabular data. This function is useful for extending the functionality of our DataFrame by adding calculated or derived columns.

Syntax of with_columns

The with_columns function is defined as follows:

DataFrame.with_columns(*exprs: IntoExpr | Iterable[IntoExpr], **named_exprs: IntoExpr)
  • exprs: The exprs parameter represents the columns to be added, which are specified as positional arguments. It accepts expression input, where strings are interpreted as column names, and other non-expression inputs are interpreted as literals.

  • named_exprs: The named_exprs parameter represents additional columns to be added, specified as keyword arguments. These columns will be renamed according to the keywords provided.

The function returns a new DataFrame with the specified columns added.

Codes to demonstrate with_columns

First, let’s look at a simple example of using with_columns function:

import polars as pl
# Creating DataFrame
data = pl.DataFrame(
{
"alpha": [4, 6, 8, 10],
"beta": [5, 4.8, 10.2, 20],
"gamma": [True, False, False, True],
}
)
# Using .with_columns to add a new column
new_dataFrame = data.with_columns((pl.col("alpha") ** 3).alias("alpha^3"))
# Printing the values
print (new_dataFrame)

Code explanation

Let’s discuss the code step-by-step:

  • Lines 3–9: We create a DataFrame named data with three columns named as alpha, beta, and gamma. The alpha column contains integer values, the beta column contains float values, and the gammma column contains boolean values.

  • Line 11: We add a new column, which will calculate the cube of the column alpha, using with_column function.

  • Line 14: We print the DataFrame with the added new column.

Second, let’s take a look at another complex code that add multiple columns using .with_columns.

import polars as pl
# Creating DataFrame
data = pl.DataFrame(
{
"alpha":[4, 6, 8, 10],
"beta": [5, 4.8, 10.2, 20],
"gamma":[True, False, False, True],
}
)
# Adding multiple columns
new_dataFrame = data.with_columns(
[
(pl.col("alpha") ** 3).alias("alpha^3"),
(pl.col("beta") * 3).alias("beta*3"),
(pl.col("gamma").not_()).alias("not gamma"),
]
)
# Printing the values
print (new_dataFrame)

Code explanation

Let's discuss the code step by step:

  • Lines 3–9: We create a DataFrame named data with three columns named as alpha, beta, and gamma. The alpha column contains integer values, the beta column contains float values, and the gamma column contains boolean values.

  • Lines 11–17: We add three new columns, which will calculate the cube of the column alpha, multiplication of column beta, and not of column gamma. Then, assigning it to variable new_dataFrame.

  • Line 14: We print the DataFrame with the added columns.

At the end, let’s explore expressions with multiple outputs. These can be automatically transformed into Structs by enabling the setting Config.set_auto_structify(True):

import polars as pl
# Creating DataFrame
data = pl.DataFrame(
{
"alpha": [4, 6, 8, 10],
"beta": [5, 4.8, 10.2, 20],
"gamma": [True, False, False, True],
}
)
with pl.Config(auto_structify=True):
new_dataFrame = data.drop("gamma").with_columns(
diffs=pl.col(["alpha", "beta"]).diff().name.suffix("_diff"),
)
# Printing the values
print (new_dataFrame)

Explanation

Let’s discuss the code step-by-step:

  • Lines 3–9: We create a DataFrame named data with three columns named as alpha, beta, and gamma. The alpha column contains integer values, the beta column contains float values, and the gamma column contains boolean values.

  • Lines 11–14: We use a with block to set a Polars configuration option (auto_structify) to True. Inside the block, a new DataFrame named new_dataFrame is created by first dropping the column c and then adding a new column named diffs. The diffs column is created by taking the differences between the values in columns alpha and beta, and the column names are suffixed with _diff.

  • Line 14: We print the DataFrame with the added new column.

Conclusion

The with_columns function in Polars is a powerful tool for extending the functionality of DataFrames by adding new columns. It provides a flexible and concise syntax for expressing column additions, and it's particularly useful when we want to enrich the data with calculated or derived values. Remember that employing this approach doesn’t generate a duplicate of the current data, making it a streamlined method to improve the DataFrame.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved