Search⌘ K

Exploratory Data Analysis

Explore how to conduct exploratory data analysis on numeric and categorical variables using histograms, bar charts, scatter plots, and hue mapping. Understand data distribution and relationships essential for preparing regression models with PyCaret.

We’ll now perform EDA on our data. As mentioned earlier, EDA is a method that helps us understand the dataset properties by using descriptive statistics and visualization. It is an important part of every machine learning or data science project because it’s essential that we understand the data set before we utilize it.

Histogram of numeric variables

The distribution of numeric variables can be visualized with a histogram that can be easily created with the hist() function.

Python 3.5
# Histogram of numeric variables
numeric = ['age', 'bmi', 'children', 'charges']
data[numeric].hist(bins=20, figsize = (10,5))
plt.show()

As we can see in the output, some of the variables have right-skewed distributions that may cause ...