Introducing Selection & Filters in Pandas

Learn how to utilize techniques for selecting subsets of data and filtering DataFrames.

Introduction

The ability to select specific rows and columns to access and filter data based on specific conditions are two of the key features of Pandas.

Selection allows you to access specific rows or columns of the data by their index and/or location in the DataFrame. If a medical dataset is indexed by Patient Insurance Number, you can access the patient whose insurance is 123 by an index lookup. You can also access the first/last patient in the DataFrame (or in any specific location).

Filtering allows you to choose a subset of the data we would like to keep. In a travel dataset, you can filter by a nation of origin, destination, or age. In a medical dataset, you can filter by age group, or health conditions. You can notice here that you are filtering based on conditions.

Similarly, in your music dataset, you can filter by country of origin of an artist. Or you could combine filters to view artists originating from the US and who have a large number of plays.

Syntax

Filtering a country column by selecting only the country named US can be done as follows:

df[df['country'] == 'US']

Travel dataset

Idea

There are endless possibilities to how you can filter travelers with their demographics or trip details.

One possibility is to filter travelers originating from a specific country and traveling to a certain destination. Then you can compare this subset with travelers from a different country. This would let you identify which nationalities to target in your ads for specific destinations.

Interview

You can be asked to perform basic filters, such as nationality or destination, as well as more complex filters. For example, you can be asked to match clients who use a Gmail account; this will need a string filter. Or, you can be asked to filter clients originating from a list of specific countries; this will require the use of .isin() function.

Medical dataset

Idea

Zooming in on subsets of patients can be beneficial; for instance, you can use filters to observe the effect of specific medications on them. An example is to filter pregnant women who have anemia, and compare this subset to women who don’t have anemia.

Interview

You can be asked to filter patients with compound filters. For example, you may have to count the number of old-age patients with heart disease and diabetes.

You can also be asked to find patients whose medical notes contain certain words, such as aggression or lethargy.

Concepts to be covered

  1. Perform basic filters using a single attribute: Think of this as filtering bands from a certain country.
  2. Extend the basic filters to include multiple columns: Think of this as filtering bands from a certain country with a high number of plays.
  3. Filter based on column values matching any item in a list: Think of this as choosing any band from the US/UK.
  4. Filter by strings: Think of this as performing a string matching on any column’s values and filtering based on that.