Search⌘ K
AI Features

Numerical Variables Transformation

Learn to apply various numerical variable transformations including logarithm, reciprocal, power, and Box-Cox to better meet Gaussian distribution assumptions required by models such as linear regression and logistic regression. Understand how to select and use these transformations to enhance machine learning model performance.

We'll cover the following...

Numerical variables transformation

In the lesson on probability distributions, we discussed Gaussian distributions and how some machine learning algorithms, such as linear regression and logistic regression, assume that the features follow a Gaussian distribution. If this assumption is not met, the model may perform poorly.

When the distribution is not Gaussian, transformations can be applied to make it more Gaussian-like. Machine learning models that assume the underlying distribution of variables to be Gaussian include:

  • Linear regression
  • Logistic regression
  • Linear discriminant analysis
  • Naive Bayes

We can apply the following transformations to the dataset’s individual features after analyzing them. Transformations can help us to achieve good results by making the underlying features more Gaussian-like.

  • Logarithm transformation: This transformation is used on the features that have positive values. This logarithm is the natural logarithm.

  • Reciprocal transformation (1x\frac{1}{x} where xx ...