Explore the Dataset
Explore the dataset to understand the similarities and differences among the customers.
Data exploration
In this lesson, we’ll start understanding a set of customers through segmentation. We will use a semiprocessed Fast Moving Consumer Goods (FMCG) dataset that contains customer information for 2,000 people. The dataset has eight columns that have both numerical and category values.
Here is some basic information about the dataset.
Feature Details
Feature | Data Type | Details |
Gender | String (categorical) | Male, Female |
Marital Status | String (categorical) | Married, Single |
Age | Integer (numeric) | Customer Age |
Education | String (categorical) | High School, University, Graduate, Unknown |
Income | Integer (numeric) | Yearly income of a customer |
Occupation | String (categorical) | Official, Management, Unemployed |
Settlement Size | String (categorical) | Small City, Mid City, Big City |
Load the dataset
Let’s import the essential Python libraries and the raw FMCG customer dataset. It’s good practice to inspect the dimensions of the dataset, the data types of each column, and the first couple of records of the imported dataset after we import it.
import numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport seaborn as sns# load csv datasetdf_customers = pd.read_csv('customers_raw.csv', header=0, index_col='CustomerID')print('Dimension of the dataset:')print(df_customers.shape)print('\nTop five records:')print(df_customers.head().to_string())print('\nColumn data types:')print(df_customers.info())
Explanation
In line 7, we load the raw customer’s dataset using the pandas
read_csv()
function.In line 10, we inspect the dimensions of the dataset.
In line 13, we take a look at a ...