Search⌘ K
AI Features

Explore the Dataset

Explore customer segmentation by examining a raw FMCG dataset with Python. Understand key relationships between age, gender, income, and occupation through data visualization methods to uncover insights for clustering.

Data exploration

In this lesson, we’ll start understanding a set of customers through segmentation. We will use a semiprocessed Fast Moving Consumer Goods (FMCG) dataset that contains customer information for 2,000 people. The dataset has eight columns that have both numerical and category values.

Here is some basic information about the dataset.

Feature Details

Feature

Data Type

Details

Gender

String (categorical)

Male, Female

Marital Status

String (categorical)

Married, Single

Age

Integer (numeric)

Customer Age

Education

String (categorical)

High School, University, Graduate, Unknown

Income

Integer (numeric)

Yearly income of a customer

Occupation

String (categorical)

Official, Management, Unemployed

Settlement Size

String (categorical)

Small City, Mid City, Big City

Load the dataset

Let’s import the essential Python libraries and the raw FMCG customer dataset. It’s good practice to inspect the dimensions of the dataset, the data types of each column, and the first couple of records of the imported dataset after we import it.

Python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# load csv dataset
df_customers = pd.read_csv('customers_raw.csv', header=0, index_col='CustomerID')
print('Dimension of the dataset:')
print(df_customers.shape)
print('\nTop five records:')
print(df_customers.head().to_string())
print('\nColumn data types:')
print(df_customers.info())

Explanation

  • In line 7, we load the raw customer’s dataset using the pandas read_csv() ...