PCA(Principal Component Analysis)

In this lesson, learn how to use PCA to do dimension reduction.

Curse of dimensionality

The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces (often with hundreds or thousands of dimensions). The common theme amongst these problems is that when the dimensionality increases, the volume of the space increases so fast that the available data becomes sparse. This sparsity is problematic for any method that requires statistical significance. To obtain a statistically sound and reliable result, the amount of data needed to support the result often grows exponentially with the dimensionality.

In the real world, the feature dimension is always thousands and millions. You don’t have enough data points to get a reliable result. Meanwhile, this also causes trouble with distance calculations as they are not an easy thing in high-dimensional spaces.

One of the important ways to alleviate the curse of dimensionality is dimension reduction. Dimension reduction works because although the data points observed are high-dimensional, what is closely related to the actual task may only be a low-dimensional distribution, that is, a low-dimensional “embedding” in the high-dimensional space.

There are many dimension reduction methods:

  • Supervised method: Linear Discriminant Analysis.
  • Unsupervised method: Principal Component Analysis.

In this lesson, we talk about the Principal Component Analysis(PCA), which is the most common and relatively simple method.

Get hands-on with 1200+ tech skills courses.