Trusted answers to developer questions

What is the pandas DataFrame.sub() method?

Get Started With Data Science

Learn the fundamentals of Data Science with this free course. Future-proof your career by adding Data Science skills to your toolkit — or prepare to land a job in AI, Machine Learning, or Data Analysis.

Overview

The pandas software library is written for Python and is mostly used for data analysis. It works as a data manipulation module.

The pandas DataFrame is a two-dimensional tabular data structure in which data is aligned in a tabular form in rows and columns.

The pandas Dataframe consists of three principal components:

  • Principal Data
  • Rows (placed left to right horizontally)
  • Columns (placed top to bottom vertically)

The pandas DataFrame.sub() method

Here, sub() means subtraction, and this method performs subtraction operations on data frames. It is an element-wise operation and works like a binary subtraction ( - ) operator.

Syntax


DataFrame.sub (other, axis = 'columns', level = None, fill_value = None)

Parameters

It has the following argument values:

  • other: This parameter is a single or multiple element data structure or list-like object. It can be a DataFrame, series, sequence, scalar, or a constant.
  • axis: This is used for deciding the axis on which the operation is applied. Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’) that is {0 or ‘index’, 1 or ‘columns’}.
  • level: This parameter is used to broadcast across a level and matching index values on the passed MultiIndex level. It is either a number or a label that indicates where to compare.
  • fill_value: This parameter is a number or None. It specifies what to do with NaN values before subtracting. If data in both corresponding DataFrame locations is missing, the result will be missing.

Here, the first parameter is required and the other three are optional.

Return value

This method returns a dataFrame in the result obtained by subtraction of two DataFrames, or a DataFrame division with a scaler.

Explanation

The first thing for implementation is to import pandas. Here we are importing pandas as pd. So, pd will be used in place of panda in the entire program.

Subtraction of a single value from all entries of a DataFrame

# importing pandas as pd
import pandas as pd
# Creating a dataframe with four observations
df= pd.DataFrame({"ClassA":[100,50,10],
"ClassB":[50,20,30],
"ClassC":[70,70,25],
"ClassD":[150,300,0]})
# Print the dataframe
print(df)
print()
#subtractin of 10 from each and every value
print(df.sub(10))
  • Lines 4–7: We create a DataFrame including dictionaries having classes as keys.
  • Line 9: We print the DataFrame.
  • Line 12: Here the multiplication method is used that is df.sub(10)

When a single parameter (10) is passed, it will be subtracted from every entry of the DataFrame.

Subtraction of distinct values from different data sets in DataFrame

# Subtract these elements from the respective class
print(df.sub([20, 10, 5, 1], axis='columns'))

We are subtracting 20 from the first class, 10 from the second class, 5 from the third class, and 1 from the fourth class where the axis is columns.

Series subtraction w.r.t index in DataFrame

# importing pandas as pd
import pandas as pd
# Creating a dataframe with three observations
df= pd.DataFrame({"ClassA":[100,50,10],
"ClassB":[50,20,30],
"ClassC":[70,70,25],})
# Print the dataframe
print(df)
print()
# subtracting with series type data
print(df.sub(pd.Series([5, 10, 2], index=[0,1,2]), axis='index'))

Explanation

Here, we have three elements in series 5, 10, and 20 and indexes as 0, 1, and 2. Since the method is applied index wise axis = index, the result is obtained in a way that the first series element 5 will be subtracted from each value of the first index which is 0. The next series element 10, will be subtracted from each value of index 1 and so on.

We can perform a variety of subtractions on one or more DataFrames just by changing parameters in different ways by using the DataFrame.sub() method. In the case of any NaNmissing value value, we can use the fill_value parameter and assign it by the value we want written in place of the empty or missing values in the data instead of NaN.

RELATED TAGS

pandas
dataframe
Did you find this helpful?