Numerical Variables Transformation

Numerical Variables Transformation refers to applying operations on the Numerical Columns to have better performance of Machine Learning models. You will learn more here.

We'll cover the following

Numerical Variables Transformation

In the Lesson of Probability Distributions, while discussing Gaussian Distributions, we discussed that algorithms in Machine Learning like Linear Regression or Logistic Regression assume the Features’ underlying distribution to be Gaussian. If not, the model might perform badly. If the distribution is not Gaussian, we can apply Transformations to make it Gaussian. Machine Learning models that assume the underlying distribution of the variables to be Gaussian are:

  • Linear Regression
  • Logistic Regression
  • Linear Discriminant Analysis
  • Naive Bayes

We can apply the following transformations on the dataset’s individual features after analyzing them. Transformations can help us to achieve good results by making the underlying features more Gaussian-like.

  • Logarithm Transformation: This transformation is used on the features that have positive values. This logarithm is the Natural Logarithm.

  • Reciprocal Transformation (1x\frac{1}{x} where xx is one of the values of the feature): This transformation can be applied to negative values and is not applied to the value 00.

  • Square Root or Cube Root Transformation: This transformation comes under the category of Power Transformations and it involves taking the power x12x^{\frac{1}{2}} or x13x^{\frac{1}{3}} where xx is the individual values of a feature.

  • Exponential or Power Transformations: It involves taking the power of an individual value of a feature (i.e xλx^\lambda), where λ\lambda is any number. The goal is to try different values of λ\lambda, and see which works best for the case at hand.

  • Box-Cox Transform : Box-Cox Transform performs transformations under the different values of theparameter λ\lambda. The boxcox() SciPy function implements the Box-Cox transformation. It takes an argument, called lambda, that controls the type of transform to perform.

Below are some common values for lambda:

  • λ\lambda = -1 is a reciprocal transform.
  • λ\lambda = -0.5 is a reciprocal square root transform.
  • λ\lambda = 0.0 is a log transform.
  • λ\lambda = 0.5 is a square root transform.
  • λ\lambda = 1.0 is no transform.
  • if λ\lambda is not specified then an optimal value is chosen by the function based on the underlying distribution.

Get hands-on with 1200+ tech skills courses.