Explore the Dataset

Explore the dataset to understand the similarities and differences among the customers.

We'll cover the following...

Data exploration

In this lesson, we’ll start understanding a set of customers through segmentation. We will use a semiprocessed Fast Moving Consumer Goods (FMCG) dataset that contains customer information for 2,000 people. The dataset has eight columns that have both numerical and category values.

Here is some basic information about the dataset.

Feature Details

Feature

Data Type

Details

Gender

String (categorical)

Male, Female

Marital Status

String (categorical)

Married, Single

Age

Integer (numeric)

Customer Age

Education

String (categorical)

High School, University, Graduate, Unknown

Income

Integer (numeric)

Yearly income of a customer

Occupation

String (categorical)

Official, Management, Unemployed

Settlement Size

String (categorical)

Small City, Mid City, Big City

Load the dataset

Let’s import the essential Python libraries and the raw FMCG customer dataset. It’s good practice to inspect the dimensions of the dataset, the data types of each column, and the first couple of records of the imported dataset after we import it.

Python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# load csv dataset
df_customers = pd.read_csv('customers_raw.csv', header=0, index_col='CustomerID')
print('Dimension of the dataset:')
print(df_customers.shape)
print('\nTop five records:')
print(df_customers.head().to_string())
print('\nColumn data types:')
print(df_customers.info())

Explanation

  • In line 7, we load the raw customer’s dataset using the pandas read_csv() function.

  • In line 10, we inspect the dimensions of the dataset.

  • In line 13, we take a look at a ...