Exploratory Data Analysis
Explore how to conduct exploratory data analysis on numeric and categorical variables using histograms, bar charts, scatter plots, and hue mapping. Understand data distribution and relationships essential for preparing regression models with PyCaret.
We'll cover the following...
We’ll now perform EDA on our data. As mentioned earlier, EDA is a method that helps us understand the dataset properties by using descriptive statistics and visualization. It is an important part of every machine learning or data science project because it’s essential that we understand the data set before we utilize it.
Histogram of numeric variables
The distribution of numeric variables can be visualized with a histogram that can be easily created with the hist() function.
As we can see in the output, some of the variables have right-skewed distributions that may cause ...