Linear Regression is the genesis of Machine Learning for many beginners. People start learning ML from Linear Regression and then go on to make awesome projects. If someone claims to be ignorant of Machine Learning’s awesomeness, they are surely living under a rock.
Let’s start with the basic concept of Machine Learning and take a tour through the world of statistics and Machine learning. Linear Regression basically means fitting a line for a set of points that represent the features.
Linear Regression is not only important for ML, it’s also important for Statistics. The method of Least square estimation is used in statistics to approximate the solution of linear regression by minimizing the least square distance of the points from the regression line.
The hypothesis function represents the equation of the line to be fitted. Here theta-0
and theta-1
represent the parameters of the regression line. In the line equation (y = mx + c
), m
is a slope and c
is the y-intercept of the line. In the given equation, theta-0
is the y-intercept and theta-1
is the slope of the regression line.
Note: Here we are dealing with a single independent variable (x).
The cost function is the function we have to minimize to get the appropriate and optimum line. Here, the difference between h-theta
and y
is known as error. We take the mean of squared error as the cost function.
The equations to calculate the value of theta-0
and theta-1
are given below. We calculate the values using these equations; this method is known as the Least Square estimation method.
Here, we are representing the features(independent variables) for each sample as x-i
and their mean as x-bar
. The output(dependent variables) for each sample is represented as y-i
and their mean as y-bar
. The total number of samples is n.
After applying the above equations, we can find the best fitting line for the scattered points. The Python code for this is represented below.
import numpy as npimport matplotlib.pyplot as pltdef estimate_coef(x, y):n = np.size(x)m_x, m_y = np.mean(x), np.mean(y)SS_xy = np.sum(y*x) - n*m_y*m_xSS_xx = np.sum(x*x) - n*m_x*m_xtheta_1 = SS_xy / SS_xxtheta_0 = m_y - theta_1*m_xreturn(theta_0, theta_1)def plot_regression_line(x, y, theta):plt.scatter(x, y, color = "b",marker = "o", s = 30)y_pred = theta[0] + theta[1]*xplt.plot(x, y_pred, color = "r")plt.xlabel('x')plt.ylabel('y')plt.show()x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])y = np.array([11 ,13, 12, 15, 17, 18, 18, 19, 20, 22])theta = estimate_coef(x, y)print("Estimated coefficients:\ntheta_0 = {} \ntheta_1 = {}".format(theta[0], theta[1]))plot_regression_line(x, y, theta)print(round(theta[0]+ theta[1]*11,4))
The same problem of linear regression can be solved in Machine Learning in three different ways.
The methods are:
The simplest method is to use a built-in library function (the code for this is given below). The dataset used is the same as the dataset used above. After fitting the line, we need to find the value of y for x = 11. We will be using the same dataset and input values for all the different methods used.
The LinearRegression()
function takes the input parameters in the form of sparse matrices of shape (n_samples, n_features) and (n_samples, n_targets).
import numpy as np;from sklearn.linear_model import LinearRegression;x = np.array([[0], [1],[2], [3], [4], [5], [6], [7], [8], [9]])y = np.array([[11], [13], [12], [15], [17], [18], [18], [19], [20], [22]])LR=LinearRegression()LR.fit(x,y)b=LR.predict(np.array([[11]]))print(round(b[0][0],4))
Gradient Descent is one of the most common methods used to optimize different convex functions in Machine Learning. Since we know that the cost function is similar to the cost function(with a difference of a factor of 1/2) given in the Least Square Method, we will be using Gradient Descent to solve the problem. We have to minimize the cost function to find the value of Theta in the regression line.
The method of gradient descent can be represented as follows:
Since we cannot update the values of theta-0
and theta-1
simultaneously, we use temporary variables:
import numpy as np;from matplotlib import pyplot as plt;# Function for cost functiondef cost(z,theta,y):m,n=z.shape;htheta = z.dot(theta.transpose())cost = ((htheta - y)**2).sum()/(2.0 * m);return cost;def gradient_descent(z,theta,alpha,y,itr):cost_arr=[]m,n=z.shape;count=0;htheta = z.dot(theta.transpose())while count<itr:htheta = z.dot(theta.transpose())a=(alpha/m)# Using temporary variables for simultaneous updation of variablestemp0=theta[0,0]-a*(htheta-y).sum();temp1=theta[0,1]-a*((htheta-y)*(z[::,1:])).sum();theta[0,0]=temp0;theta[0,1]=temp1;cost_arr.append(float(cost(z,theta,y)));count+=1;cost_log = np.array(cost_arr);plt.plot(np.linspace(0, itr, itr, endpoint=True), cost_log)plt.xlabel("No. of iterations")plt.ylabel("Error Function value")plt.show()return theta;x = np.array([[0], [1],[2], [3], [4], [5], [6], [7], [8], [9]])y = np.array([[11], [13], [12], [15], [17], [18], [18], [19], [20], [22]])m,n=x.shape;z=np.ones((m,n+1),dtype=int);z[::,1:]=x;theta=np.array([[21,2]],dtype=float)theta_minimised=gradient_descent(z,theta,0.01,y,10000)new_x=np.array([1,11])predicted_y=new_x.dot(theta_minimised.transpose())print(round(predicted_y[0],4));
The equation for finding theta in case of Moore-Penrose inverse is:
θ = (X ′ X) −1 X ′ y
It is implemented in the code below.
import numpy as np;# Input Matrixx= np.array([[0], [1],[2], [3], [4], [5], [6], [7], [8], [9]])# Output Matrixy= np.array([[11], [13], [12], [15], [17], [18], [18], [19], [20], [22]])m,n=x.shape;# Adding extra ones for the theta-0 or bias termz=np.ones((m,n+1),dtype=int);z[:,1:]=x; # z is Input matrix with added 1smat=np.matmul(z.transpose(),z); # product of z and z transposematinv=np.linalg.inv(mat) #inverse of above productval=np.matmul(matinv,z.transpose()) # Product of inverse and z transposetheta=np.matmul(val,y) # Value of theta by multiplying value calculated above to ynew_x=np.array([1,11]);predicted_y=new_x.dot(theta);print(round(predicted_y[0],4));
Now that we’ve learned linear regression, let’s apply it to a real Dataset. The dataset we will use is the Boston dataset.
It has 506 samples, 13 features, and one column as an output column. The 14 column is output. Below is a sample code for Boston Dataset.
import numpy as np;from sklearn.linear_model import LinearRegression;from sklearn.datasets import load_bostonX, y = load_boston(return_X_y=True)print(X.shape)# We can see that there are 13 featuresLR=LinearRegression()LR.fit(X,y)# finding the price for given inputb=LR.predict([[0.00632,18.0,2.31,0.0,0.538,6.57,65.5,4.09,1,296,15.5,396.9,4.98]])print(b)