Curse of dimensionality

The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces (often with hundreds or thousands of dimensions). The common theme amongst these problems is that when the dimensionality increases, the volume of the space increases so fast that the available data becomes sparse. This sparsity is problematic for any method that requires statistical significance. To obtain a statistically sound and reliable result, the amount of data needed to support the result often grows exponentially with the dimensionality.

In the real world, the feature dimension is always thousands and millions. You don’t have enough data points to get a reliable result. Meanwhile, this also causes trouble with distance calculations as they are not an easy thing in high-dimensional spaces.

One of the important ways to alleviate the curse of dimensionality is dimension reduction. Dimension reduction works because although the data points observed are high-dimensional, what is closely related to the actual task may only be a low-dimensional distribution, that is, a low-dimensional “embedding” in the high-dimensional space.

There are many dimension reduction methods:

Supervised method: Linear Discriminant Analysis.
Unsupervised method: Principal Component Analysis.

In this lesson, we talk about the Principal Component Analysis(PCA), which is the most common and relatively simple method.

Preliminaries

Working with Datasets

Feature Engineering

General Concepts

Linear Regression

Logistic Regression

Support Vector Machine

Tree Model and Ensemble Method

Unsupervised Learning

Deep Learning

Others

What's Next

PCA(Principal Component Analysis)

Curse of dimensionality

What is Principal Component Analysis(PCA)