...

Applying Functions to Data

This lesson will teach us how to apply user defined functions to individual items in a dataset.

We'll cover the following...

Functions on individual Items of a column

Example: Converting numbers
Example: Converting numerical values to categories

During data analysis, we need to use our data to perform some calculations and generate some new data or output from it. Pandas makes it very easy to apply user-defined operations, a.k.a functions, in Python terminology, on individual data items, rows, and columns of a dataframe.

Functions on individual Items of a column

Pandas has an apply function which applies the provided function to the data. One of the reasons for the success of pandas is how fast the apply function performs. We will be using the California Housing Dataset. All of the data is in the housing.csv file.

Press + to interact

We have defined the function that we want to apply in lines 2-3. We write our function so that it will receive each value of the column on which it is applied. We write it to operate on a single value. It will then be used on each of the values in the column.

In line 11 we select the column median_income and use the apply function. We give the name of the function we defined above to it and store the results as converted. We replace the column in the original dataframe with converted in line 15.

Press + to interact

Python 3.5

# Function to convert a single numerical value to a category
def convert_categories(value):
    if value > 10:
        return 'high-incomes'
    elif value > 2 and value < 10:
        return 'moderate-incomes'
    else:
        return 'low-incomes'
# Read file
import pandas as pd 
df = pd.read_csv('housing.csv')
# print original value
print('Original Values:')
print(df['median_income'].head())
# Apply Function on the column
categories = df['median_income'].apply(convert_categories)
print('Converted Values: ')
print(categories.head())
# make a new column in the dataframe
df['income_category'] = categories

We have defined our function convert_categories in lines 2-8. Remember the values in the dataset are in tens of thousands of dollars. We can say that if the value is greater than $10$ , it is a high-income housing block. If the values are between 2 and 10, it is a moderate-income housing block, and if the value is less than that, then it is a low-income housing block.

In line 19 we select the column median_income and use the apply function. We give it the name of the function we defined above and store the results as categories. We add a new column in the original dataframe named income_category and save our categories in that column in line 24.

What is Data Science

Python Basics

Handling Tabular Data in Python

Data Cleaning

Exploratory Data Analysis

Statistical Inference

Predictive Models

Machine Learning

How to Predict the Traffic Volume Using Machine Learning

Applying Functions to Data

Functions on individual Items of a column

Example: Converting numbers

Example: Converting numerical values to categories