Quantitative vs. Qualitative Data and Creating Dummies
Explore the distinctions between quantitative and qualitative data and understand how to process categorical variables by creating dummy variables using pandas. Learn to eliminate redundancy by dropping one dummy column, ensuring efficient machine learning model inputs and improved data handling techniques.
We'll cover the following...
We can have quantitative or qualitative variables in the data. So far, we have worked with several datasets with numerical feature variables (X). Therefore, let's explore the numerical (discrete and continuous) vs. categorical (nominal and ordinal) features.
Quantitative data
Quantitative data, also called numerical data, contains numerical variables that can be discrete or continuous.
Discrete
Discrete data can only take certain values, a complete digit or a finite number of possible values:
Students: {10, 20, 30}
Deaths: {1, 5, 6}
Patients: {100, 400, 1000}
We can't have 10.5 students or 1.5 deaths.
Continuous
This type of data can potentially have infinite possible values (digit or float), such as:
Weight: {1, 1.1, 3.5, 3.5555555}
Price: {10, 10.50, 50.25}
Qualitative data
Qualitative data, also called categorical data, contains categorical variables that define some characteristics. Categorical variables can be nominal or ordinal.
Nominal
Nominals are the unordered lists of categories, such as:
Animal: {cat, dog}
Time: {dinner, lunch}
Blood_group: {A, ...