Exploring Categorical Quantities
This lesson will focus on how to explore relationships between different categorical variables in the dataset with examples.
We'll cover the following...
Exploratory Data Analysis is all about exploring relationships in the dataset that might be hidden or might not be easy to spot just by looking at the dataset. We will try to explore these kinds of relationships in the Default of Credit Card Clients Dataset. We will use the cleaned version of the dataset from the lesson Inconsistent Data. The details of individual columns are mentioned below.
More specifically, we are interested in finding out how the variable default.payment.next.month is affected by other variables.
Grouping
As we saw in Chapter 3 of this course, grouping data can give us very useful insights. Let’s see how the categorical variables GENDER, EDUCATION, and MARRIAGE are related to default.payment.next.month.
GENDER
We group the data by EDUCATION and default.payment.next.month on line 6 and use the function size to retrieve the number of males and females. We then use the function unstack in the next line. The function unstack performs two steps here:
- It changes the table into a dataframe
- It names the columns
noandyes, the two categories of the variabledefault.payment.next.month.
We can see the resultant dataframe ...