Performing Filtering
Explore multiple techniques to filter pandas DataFrames in Python, including boolean indexing, loc, query, and isin methods. Learn how to select rows based on conditions, such as filtering by country names, to prepare data effectively for analysis.
We'll cover the following...
Method 1: Boolean indexing
We use boolean indexing to filter a DataFrame by creating a boolean array that specifies the criteria we want to use to select the rows. Then, we use that boolean array to index the DataFrame and select only the rows that meet the criteria.
In the following example, we'll use boolean indexing to select only the rows in the DataFrame where the value in the COUNTRY column is UNITED KINGDOM.
Let’s review the code line by line:
Lines 1–3: We import the pandas library, load the dataset, and set to display all columns in the DataFrame.
Line 4: We create a boolean array called
maskthat returnsTruefor rows in theemployees_dfDataFrame where the value in theCOUNTRYcolumn isUNITED KINGDOMand returnsFalsefor all other rows.Line 5: We then use the boolean array to index the
employees_dfDataFrame and select only the rows that meet the criteria, storing them in a new DataFrame calledfiltered_df.Line 6: We print the
filtered_dfDataFrame to see the filtered records.
Method 2: The loc() method
To filter a DataFrame using the loc() method, we specify the row and column labels we want to include in the resulting DataFrame. Like we did earlier, we'll filter records with the COUNTRY column value UNITED KINGDOM using the loc() method.
Let’s review the code line by line:
Lines 1–3: We import the pandas library, load the dataset, and set to display all the DataFrame columns.
Line 4: We use the
loc()method to filter theemployees_dfDataFrame based on the values in theCOUNTRYcolumn by specifying that we only want to include rows where the value in theCOUNTRYcolumn isUNITED KINGDOM. We store the resulting records in a new DataFrame calledfiltered_df.Line 5: We print
filtered_dfto see the filtered records.
Method 3: The query() method
To filter a DataFrame using the query() method, we pass it a string containing the condition we want to use to filter the DataFrame.
Let’s review the code line by line:
Lines 1–3: We import the pandas library, load the dataset, and display all the DataFrame columns.
Line 4: We use the
query()method to filter theemployees_dfDataFrame based on the values in theCOUNTRYcolumn and specify that we only want to include rows where the value in theCOUNTRYcolumn isUNITED KINGDOM. We store the filtered records in a new DataFrame calledfiltered_df.Line 5: We print the
filtered_dfto see the filtered records.
Method 4: The isin() method
To use the isin() method to filter for records in a DataFrame, we pass a list of values we want to filter to the isin() method.
Let’s review the code line by line:
Lines 1–3: We import the pandas library, load the dataset, and display all the DataFrame columns.
Line 4: We then use the
isin()method to filter theemployees_dfDataFrame based on the values in theCOUNTRYcolumn and specify that we only want to include rows where the value in theCOUNTRYcolumn isUNITED KINGDOMorMEXICO. We then store the filtered records in a new DataFramefiltered_df.Line 5: We print the
filtered_dfto see the filtered records.