Simplifying Machine Learning with PyCaret in Python/

...

Regression with PyCaret

Let’s learn how to import necessary libraries and datasets for regression with PyCaret.

We'll cover the following...

The linear regression model
Regression methods in PyCaret
Importing the necessary libraries
Loading the dataset

The linear regression model

A fundamental task in supervised machine learning is regression where the goal is to predict a continuous value. This is achieved by understanding the relationship between the target variable $y$ and the feature variables $x$ on a given dataset. One of the most basic regression models is linear regression. It is defined in the following equation. The equivalent vectorized form of the equation is also provided, where the inner product of the transposed vector $\beta^{T}$ and $X_n$ is calculated.

y_{n}=\beta_{0}+\beta_{1} x_{n 1}+\cdots+\beta_{p} x_{n p}+\epsilon_{n}= \beta^{T}_{} X_n +\epsilon_{n}

$y_{n}$ is the target variable for the $n$ th instance of the given dataset.
$x_{1}$ to $x_{p}$ are the feature variables.
$\beta_{0}$ is the intercept term.
$\beta_{1}$ to $\beta_{p}$ are the coefficients of the feature variables.
$\epsilon$ is the error variable.

Regression methods in PyCaret

Besides linear regression, we have other regression models such as lasso, random forest, support vector machines, and gradient boosting. In the remaining lessons, we’ll see how PyCaret can help us choose and train the optimal regression model for a specific dataset. We’ll also learn about exploratory data analysis (EDAExploratory Data Analysis), a method that lets us examine and understand the basic statistical properties of a dataset.

Importing the necessary libraries

First, we import the Python libraries that are necessary for our project.

Press + to interact

Some standard machine learning libraries are included, such as pandas, Matplotlib, and Seaborn. We also import all PyCaret functions that are related to regression. The last line specifies that Matplotlib figures will have a 300 DPI resolution, but we can omit that if we wish.

Loading the dataset

Machine learning projects can only succeed if the appropriate data is available, so PyCaret includes a variety of datasets that can be used to test its features. In this chapter, we’ll use insurance.csv, a dataset that originates from the book Machine Learning with R by Brett Lantz. This is a health insurance dataset, where the features are various attributes including age, sex, body mass index (BMI), whether the person is a smoker or not, number of children, and US region. Furthermore, the dataset’s target variable is the billed charges for every individual. Real-world data is usually more complex, but working with so-called toy datasets will help us grasp the concepts and techniques before dealing with more difficult cases.

We use the get_data() PyCaret function to load the dataset to a pandas dataframe.

Press + to interact

Introduction to Machine Learning

Regression

Classification

Clustering

Customer Segmentation with K-Means Clustering

Anomaly Detection

Natural Language Processing

Deploying a Machine Learning Model

Conclusion

Appendix

Regression with PyCaret

The linear regression model

Regression methods in PyCaret

Importing the necessary libraries

Loading the dataset