What is Regression ?

You'll learn about Regression in this lesson. It is a concept that has been borrowed from Inferential Statistics and involves predicting a real-valued output.

What is Regression?

Regression comes under supervised learning and it involves predicting a real-valued output. Classification predicts a discrete-valued output.

Key terms

Input column or Independent features

The columns that are used to predict the output column are called the input columns or independent features. These are denoted as x1x_1, x2x_2, x3x_3, … xnx_n where x1x_1 is the first feature and so on. Note that nn denotes the total number of features or dimensions.

Output column or dependent feature

The column that is to be predicted is called the output column or dependent feature. It is denoted as yy.

Total instances or samples

  • Total number of rows or instances or samples are denoted as mm.

  • Samples, rows, or instances are denoted as x1x^1, x2x^2, …, and xmx^m.

  • The first feature value for the first sample is denoted as x11x_1^1. The second feature value for the first sample is denoted as x21x_2^1. The nthn^{th} feature value for the first sample is denoted as xn1x_n^1.

  • If we have nn = 2 features and mm = 3 instances, then we will have the following representation of individual values:

Row 11 = x11x_1^1 , x21x_2^1, y1y^1

Row 22 = x12x_1^2 , x22x_2^2, y2y^2

Row 33 = x13x_1^3 , x23x_2^3, y3y^3

Row mm = x1mx_1^m , x2mx_2^m, ymy^m

Training dataset

The dataset taken out of the original preprocessed labeled dataset on which the Machine Learning model is trained is called the training dataset.

Test dataset

The unseen dataset on which the model is evaluated is called the test dataset.

Validation dataset

We use the dataset, along with the training data set, to evaluate the performance of the trained model. Depending on the model’s performance, we either train the model again or finalize it.

Numerical and categorical features

  • Numerical columns or features have real number values present in them, like height, price of a house, etc.

  • Categorical features have values with which category can be associated. For example, sex can be male or female.

  • Categorical features can be ordinal and nominal.

  • Ordinal categorical variables have an order associated with them such as lower class, middle class, and upper class.

  • Nominal categorical variables don’t follow any order like sex.

  • In Python Numerical features have the type float or int. Categorical features are usually represented as object type when they are stored in Pandas Dataframe.

Get hands-on with 1200+ tech skills courses.