Search⌘ K
AI Features

R-square and Goodness of the Fit

Explore how to evaluate the goodness of fit in regression models using R-squared and adjusted R-squared. Understand their differences, limitations, and why adjusted R-squared accounts for model complexity to help you select better predictive models.

Let’s understand the r-square and the goodness of fit.

R-squared

The coefficient of determination, R2R^2 or r2r^2 (“R-squared”), is the proportion of the variance in the dependent variable that is predictable from the independent variable(s). It measures how well-observed outcomes are replicated by the model based on the proportion of total variation of outcomes explained by the model.

The better the linear regression (on the right) fits the data in comparison to the simple average (on the left graph), the closer the value of R2R^2 is to 1. The areas of the blue squares represent the squared residuals concerning the linear regression. The areas of the red squares represent the squared residuals concerning the average value.

Key limitations of R-squared:

  • Can’t determine whether the coefficient estimates and predictions are biased, so we must assess (evaluate) the residual plots.

  • Does not indicate whether a regression model is adequate. We can have a low R-squared value for a good model or a high R-squared value for a model that does not fit the data.

  • It’s possible to get a negative value of R-squared. If the fit is worse than a horizontal line, the R-squared is negative. In this case, the R-squared can’t be interpreted as the square of a correlation. Such situations indicate that a constant term should be added to the model.

Now, let’s get into more detail about R-squared vs. adjusted R-squared.

R-squared vs. adjusted R-squared

Typically, the accuracy score R2R^2,

R2=1Explained varianceTotal variance=1fraction of unexplained variance=1i=1n(yiy^i)2i=1n(yiyˉ)2\small \begin{align*} R^2 &= 1- \frac{\text{Explained variance}}{\text{Total variance}}\\ &= 1 - {\text{fraction of unexplained variance}}\\ &= 1 - \frac{\sum_{i=1}^n(y_i - \hat{y}_i)^2}{\sum_{i=1}^n(y_i - \bar{y})^2} \end{align*}

R20.674R^2 \sim 0.674 means our trained model has captured or accounted for ~67.4 % of the variability in the features (independent variables). Our goal is to capture as much variability as possible. So, R2R^2 closer to 1 is the target.

Note:

R2R^2 = 0: None of the variability of the data around its mean has been captured or explained by the model.

R2R^2 = 1: All the variability has been captured in the response data around its mean.

Think about the mean line and the model fitted line to understand the variability or variance.

R2R^2 is a common metric and easily interpretable in terms of model accuracy in capturing the variance. It does not depend upon the scale of the target yy. However, there are problems with R2R^2 that we’ll discuss next.

Note: It is important to understand that lower values of R2R^2 are not always bad.

Regression is not all about getting the best predictions all the time. In some fields, such as studying human behavior, we don’t expect to get higher scores, not even closer to 0.5. If the goal is inference (reaching a conclusion based on evidence and reasoning), perhaps a score between 0.3–0.4 is enough to measure the reportedly reliable effect. Indeed, machine learning is much more than just predictions.

Problem with R-squared

R2R^2 is a good metric for regression evaluation because it shows how well the data fits a curve or line. However, if we keep adding predictor variables, the value of R2R^2 will keep increasing (it never decreases), which is a problem. A similar problem is if we have too many terms (data points) and higher-order polynomials in the data. This could lead to an overfitted model, and the reported value of R2R^2 is misleading.

This is where adjusted R2R^2 comes in and adjusts the values based on the number of features/variables (pp) and the number of terms/observations (nn). It also depends upon the value of R2R^2 as shown in the formula:

Radj2=1(1R2)[(n1)n(p+1)]R^2_{adj} = 1-(1-R^2)\begin{bmatrix}\frac{(n-1)}{n-(p+1)}\end{bmatrix}

  • nn is the number of points in our data sample.
  • pp is the number of independent variables/features/regressors, excluding the constant.

Note:

  • The gap between R2R^2 and Radj2R^2_{adj} could increase in case the newly added features/predictors bring in more complexity than the power to predict the target variables.

  • R2R^2 assumes that every feature explains the target variation. However, Radj2R^2_{adj} tells us the percentage of variation explained by only the features that affect the target.

Let’s think about some features that are not important at all in predicting the house price, such as the buyer or seller’s age, sex, number of persons living in the house, and so on. In our trained model above:

  • R20.674R^2 \sim 0.674 and Radj20.669R^2_{adj} \sim 0.669.
  • Suppose, with the addition of one more feature (buyer sex => male/female), the R2R^2 increases to 0.715 (71.5%).

Does it make sense? What if we add more features? This is misleading and a problem. The value of R2R^2 must be adjusted. Let’s assume the value of Radj20.652R^2_{adj} \sim 0.652 when we add a new feature (buyer sex => male/female) in our model to predict the house price. Now, if we compare our models, the one without this additional feature is a still better model with a higher value of Radj2R^2_{adj}, which makes sense.

Note: Understanding the data and its features in the context of domain knowledge is very important in data science. We must constantly talk to our customers and experts with different backgrounds to improve the model performance. There is no universally best model, and we must account for everything to best use the available datasets.

R2R^2 shows how well terms (data points) fit a curve or line. Adjusted R2R^2 also indicates how well terms fit a curve or line but adjusts the number of terms in a model. If we add more useless variables to a model, the adjusted R-squared will decrease. If we add more useful variables, the adjusted R-squared will increase. Adjusted R2R^2 will always be less than or equal to R2R^2.