Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

pandas
python
groupby

What is the groupby command in pandas?

Educative Answers Team

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

Python allows many different libraries that enable data manipulation. One such library, pandas, has a command used to group the dataset by the selected column. It can be used to group large datasets and apply operations on them.

The default implementation of groupby is:

dataframe.groupby( by= None, axis= 0, level= None, as_index: bool = True, sort:bool = True, group_key:bool = True, squeeze: bool = False, observed:bool = False )

Parameters

  • by: mapping, function, label, list of labels* - This is used to define the groups for groupby. These can be functions, labels, or several labels (in order of group).

  • level: int, level name, sequence - You can group the axis in levels if the axis is a MultiIndex(hierarchical).

  • axis: 0 or 1 - Split along rows(0) or columns(1).

  • as_index: bool - Return objects with group labels as the index.

  • sort: bool - Sort group keys.

  • group-key: bool - Add group key to an index to identify pieces.

  • squeeze: bool - Reduce dimensionality, if possible.

  • observed: bool - Only applies if groupers are Categorical.

Code

Let’s look at an example. Import the library and load the dataset in the data frame. Here, the dataset includes the zip codes for different cities in the ​US.

main.py
data.csv
zip,city
35828,Danville
35828,Parma
29682,Six Mile
64759,Lamar
10028,New York
37204,Washington
10027,New York
19801,Wilmington
20008,Washington

Use groupby to group zip codes according to the ​city.

main.py
data.csv
zip,city
35828,Danville
35828,Parma
29682,Six Mile
64759,Lamar
10028,New York
37204,Washington
10027,New York
19801,Wilmington
20008,Washington

Groupby can be used to group data into multiple levels.

Note: grouping is done according to the array passed, with the first element being the first condition.

#groupby according to city and then by state
grouped = df.groupby(['city', 'state')
#display the number of zip codes in each country of state
grouped.first()

The official documentation can be found here.

RELATED TAGS

pandas
python
groupby
Copyright ©2022 Educative, Inc. All rights reserved

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

Keep Exploring