Let’s understand the r-square and the goodness of fit.

R-squared

The coefficient of determination, $R^2$ or $r^2$ (“R-squared”), is the proportion of the variance in the dependent variable that is predictable from the independent variable(s). It measures how well-observed outcomes are replicated by the model based on the proportion of total variation of outcomes explained by the model.

The better the linear regression (on the right) fits the data in comparison to the simple average (on the left graph), the closer the value of $R^2$ is to 1. The areas of the blue squares represent the squared residuals concerning the linear regression. The areas of the red squares represent the squared residuals concerning the average value.

Key limitations of R-squared:

Can’t determine whether the coefficient estimates and predictions are biased, so we must assess (evaluate) the residual plots.
Does not indicate whether a regression model is adequate. We can have a low R-squared value for a good model or a high R-squared value for a model that does not fit the data.
It’s possible to get a negative value of R-squared. If the fit is worse than a horizontal line, the R-squared is negative. In this case, the R-squared can’t be interpreted as the square of a correlation. Such situations indicate that a constant term should be added to the model.

Typically, the accuracy score $R^2$ ,

\small \begin{align*} R^2 &= 1- \frac{\text{Explained variance}}{\text{Total variance}}\\ &= 1 - {\text{fraction of unexplained variance}}\\ &= 1 - \frac{\sum_{i=1}^n(y_i - \hat{y}_i)^2}{\sum_{i=1}^n(y_i - \bar{y})^2} \end{align*}

$R^2 \sim 0.674$ means our trained model has captured or accounted for ~67.4 % of the variability in the features (independent variables). Our goal is to capture as much variability as possible. So, $R^2$ closer to 1 is the target.

Note:

$R^2$ = 0: None of the variability of the data around its mean has been captured or explained by the model.

$R^2$ = 1: All the variability has been captured in the response data around its mean.

Think about the mean line and the model fitted line to understand the variability or variance.

$R^2$ is a common metric and easily interpretable in terms of model accuracy in capturing the variance. It does not depend upon the scale of the target $y$ . However, there are problems with $R^2$ that we’ll discuss next.

Note: It is important to understand that lower values of $R^2$ are not always bad.

Regression is not all about getting the best predictions all the time. In some fields, such as studying human behavior, we don’t expect to get higher scores, not even closer to 0.5. If the goal is inference (reaching a conclusion based on evidence and reasoning), perhaps a score between 0.3–0.4 is enough to measure the reportedly reliable effect. Indeed, machine learning is much more than just predictions.

Problem with R-squared

$R^2$ is a good metric for regression evaluation because it shows how well the data fits a curve or line. However, if we keep adding predictor variables, the value of $R^2$ will keep increasing (it never decreases), which is a problem. A similar problem is if we have too many terms (data points) and higher-order polynomials in the data. This could lead to an overfitted model, and the reported value of $R^2$ is misleading.

This is where adjusted $R^2$ comes in and adjusts the values based on the number of features/variables ( $p$ ) and the number of terms/observations ( $n$ ). It also depends upon the value of $R^2$ as shown in the formula:

R^2_{adj} = 1-(1-R^2)\begin{bmatrix}\frac{(n-1)}{n-(p+1)}\end{bmatrix}

$n$ is the number of points in our data sample.
$p$ is the number of independent variables/features/regressors, excluding the constant.

Note:

The gap between $R^2$ and $R^2_{adj}$ could increase in case the newly added features/predictors bring in more complexity than the power to predict the target variables.

$R^2$ assumes that every feature explains the target variation. However, $R^2_{adj}$ tells us the percentage of variation explained by only the features that affect the target.

Let’s think about some features that are not important at all in predicting the house price, such as the buyer or seller’s age, sex, number of persons living in the house, and so on. In our trained model above:

$R^2 \sim 0.674$ and $R^2_{adj} \sim 0.669$ .
Suppose, with the addition of one more feature (buyer sex => male/female), the $R^2$ increases to 0.715 (71.5%).

Does it make sense? What if we add more features? This is misleading and a problem. The value of $R^2$ must be adjusted. Let’s assume the value of $R^2_{adj} \sim 0.652$ when we add a new feature (buyer sex => male/female) in our model to predict the house price. Now, if we compare our models, the one without this additional feature is a still better model with a higher value of $R^2_{adj}$ , which makes sense.

Note: Understanding the data and its features in the context of domain knowledge is very important in data science. We must constantly talk to our customers and experts with different backgrounds to improve the model performance. There is no universally best model, and we must account for everything to best use the available datasets.

$R^2$ shows how well terms (data points) fit a curve or line. Adjusted $R^2$ also indicates how well terms fit a curve or line but adjusts the number of terms in a model. If we add more useless variables to a model, the adjusted R-squared will decrease. If we add more useful variables, the adjusted R-squared will increase. Adjusted $R^2$ will always be less than or equal to $R^2$ .

1.Course Introduction

2.Linear Regression

3.Regularization

4.Bias-Variance Trade-off

5.Categorical Features

6.Logistic Regression

7.Logistic Regression: Titanic Data

Project

8.Multiclass Classification and Handling Imbalanced Classes

9.Project: Predicting Chronic Kidney Disease

10.K-Nearest Neighbors

11.Implementation of K-Nearest Neighbors

12.Logistic Regression vs. KNN

13.Decision Tree Learning

Project

14.Bootstrapping and Confidence Interval

15.Support Vector Machine

16.Practice and Comparisons

17.What's Next?

18.Appendix

R-square and Goodness of the Fit

R-squared

R-squared vs. adjusted R-squared

Problem with R-squared