What is Python pivot_table() in Pandas?

Overview

pivot_table() creates a spreadsheet-style pivot table as a DataFrame in Pandas. The levels of the pivot table are stored in multiIndex objects, i.e., hierarchical indexes on the index and columns of the resultant DataFrame.

Syntax

Parameters

data: This parameter represents the DataFrame.
values: This is the column to aggregate. It is an optional parameter.
index:

Type = column, grouper, array, or list of the previous
It is a required parameter. This parameter helps group our data using the index of the pivot table. When an array is passed, it must be used the same as the column values.

columns:

Type = column, grouper, array, or list of the previous
It is a required parameter. This parameter helps group our data using the column of the pivot table. When an array is passed, it must be used the same as the column values.

Note: For index and columns parameter, if the array is passed, its length must be the same as the length of data. The list can contain any type other than the list.

aggfunc:

Type: A function, list of functions, or a dict
Default: numpy.mean
It is a required parameter.

Note: If we pass the list of functions, the resultant pivot table contains hierarchical columns. The top level of these columns are the names of functions. These names are concluded from the objects of functions by themselves. If we pass dict, the key is the column to aggregate. The value is a function or list of functions.

fill_value:

Type: Scalar
Default = None
It contains the value used to replace the missing values within the resultant pivot table after the aggregation.

margins:

Type: Boolean
Default = False
It is used to add all the columns and rows. For example, we can use it for grand totals and subtotals.

dropna:

Type: Boolean
Default: True
This parameter helps to exclude columns whose values are

margins_name:

Type: String
Default: 'All'
It is the name of the row or column that will hold the totals when the margins’ value is True.

observed:

Type: Boolean
Default: False
It applies if any of the groupers are categorical.
If it is True, it displays the observed values for categorical grouper. If it is False, it displays all the values for categorical groupers.

sort:

Type: Boolean
Default: True
It must be defined to sort the result.

Return value

pivot_table() returns the DataFrame, which is an Excel style pivot table.

Code example

Let us discuss an example regarding the pivot_table() method.

import pandas as pd
import numpy as np 
dtf = pd.DataFrame({"cl1": ["pr1", "pr1", "pr1", "pr1", "pr1",
                         "pr2", "pr2", "pr2", "pr2"],
                   "cl2": ["1", "1", "1", "2", "2",
                         "1", "1", "2", "2"],
                   "cl3": ["half", "half", "half", "half",
                         "full", "full", "half", "half",
                         "full"],
                   "cl4": [6, 3, 2, 6, 3, 2, 5, 1, 7],
                   "cl5": [3, 4, 5, 2, 6, 5, 8, 8, 5]})
print(dtf)
# In the above code, we defined the dataframe.
tbl1 = pd.pivot_table(dtf, values='cl4', index=['cl1', 'cl2'], columns=['cl3'], aggfunc=np.sum)
print(tbl1)
tbl2 = pd.pivot_table(dtf, values='cl4', index=['cl1', 'cl2'], columns=['cl3'], aggfunc=np.sum, fill_value=0)
print(tbl2)
tbl3 = pd.pivot_table(dtf, values=['cl4', 'cl5'], index=['cl1', 'cl3'], aggfunc={'cl4': np.mean,'cl5': np.mean})
print(tbl3)
tbl4 = pd.pivot_table(dtf, values=['cl4', 'cl5'], index=['cl1', 'cl3'], aggfunc={'cl4': np.mean,'cl5': [min, max, np.mean]})
print(tbl4)