Fundamentals of Machine Learning: A Pythonic Introduction/

...

Principal Component Analysis

Gain insights into PCA theory, address dimensionality challenges, and acquire hands-on skills in PCA implementation using real-world data.

We'll cover the following...

Curse of dimensionality
- Example
Dimensionality reduction
- PCA
- Implementing PCA from scratch

Curse of dimensionality

Curse of dimensionality in machine learning refers to the challenges and computational complexities that arise when dealing with a large number of features (high-dimensional data/high-dimensional feature space). As the number of features or dimensions increases, the amount of data needed to maintain reliable and meaningful patterns also increases, often leading to increased data and computational demands and the risk of overfitting.

Example

Consider a product recommendation system where each product is described by multiple features such as price, size, color, brand, and so on. As the number of features increases, possible combinations grow exponentially, making it harder to find meaningful relationships between products and user preferences. This high-dimensional data can lead to sparse data points, which makes accurate predictions more challenging and requires more data to avoid unreliable results, hence, illustrating the curse of dimensionality.

Dimensionality reduction

Dimensionality reduction involves decreasing the number of features and is achieved by either selecting the most significant ones or by transforming them into a smaller set of new features. Not all dimensionality reduction methods aim to maintain information (to reconstruct or decompress). Different objectives can be defined in this regard.

PCA

Principal Component Analysis (PCA) is a dimensionality reduction technique that identifies key patterns and relationships within data by projecting it onto a lower-dimensional space while preserving as much variance as possible.

We first need to understand the dimensions to understand PCA. Imagine you’re in a video game where you can move forward, backward, left, and right. These are two dimensions. Now, imagine you can also fly up or dig down. That’s a third dimension. In data science, dimensions are like these directions, but they can be anything—age, height, income, etc.

Note: We can visualize up to three dimensions easily, but what if we have more? That’s where PCA comes in. It helps us to reduce the number of dimensions while keeping the most important information intact.

Properties of PCA

To explain the essential properties of PCA, let’s take an example of $n$ data points in $d$ dimensional space being the columns of the matrix $X_{d\times n}$ . Furthermore, let the corresponding columns of the matrix $Z_{k \times n}$ represent the $k$ dimensional projections of the data points estimated using PCA.

Note: Dimensional projections refer to the representation of data points in a lower-dimensional space (k-dimensional) while preserving as much variance as possible, achieved by projecting the original data points onto a new set of axes defined by the rows of the matrix $W$ .

Following are the key properties of PCA:

PCA is a linear method, that is, there exists a matrix $W_{k \times d}$ such that:

Z = WX

The rows of $W$ are orthonormal bases of the subspace ( $k$ dimensional space).

Note: Orthonormal bases are sets of vectors in a vector space where each vector has a unit length (norm of 1) and is orthogonal (perpendicular) to all other vectors in the set, making them particularly useful for linear transformations and projections.

The reconstruction of $X$ , denoted by $\hat{X}$ , from $Z$ is also linear, that is:

\hat{X} = W^TZ

PCA minimizes the reconstruction error using Frobinus norm $\|X-\hat{X}\|^2_F$ .

PCA maximizes the projection variance, that is:

W^* = \argmax_{W}\bigg[\sum_{i=1}^n\sum_{j=1}^k (z_{ij}-\mu_{z_j})^2\bigg]

Course Overview

Supervised Learning

Detect Cyber Intrusion Using Machine Learning

Clustering

Project: Bag of Visual Words

Generalized Linear Regression

Face Recognition Using Kernel Linear Discriminant

Support Vector Machine

Logistic Regression

Ensemble Learning

Early Stage Diabetes Prediction Using Ensemble Learning

Decoding Dimensions: PCA and Autoencoders

Image Reconstruction Using PCA

Image Colorization using Autoencoders

Colorful Face Generation with VAEs

Appendix

Wrapping Up

How to Predict the Traffic Volume Using Machine Learning

Principal Component Analysis

Curse of dimensionality

Example

Dimensionality reduction

PCA

Properties of PCA