Principal component analysis (PCA)

Redundant information can skew the model outcome if we have a dataset with highly correlated features. This is known as the multicollinearity problem. Using Principal Component Analysis (PCA), we can reduce the number of attributes without losing the original information.

PCA is a data transformation technique that combines existing features into new components to maximize data variance. PCA also makes these components independent of each other (minimizing correlation) and ranks them based on their contribution factor. Later, we can select a subset of transformed features (components) that represent most of the data variance.

Let’s assume we have a dataset with two features (feature 1 and feature 2). PCA tries to fit these two features and calculates the first component in such a way that the variance is maximum and the sum of squared errors is the minimum. To do that, PCA draws a line through the observations like a regression line (the red line on slide 3). This is the first component. ...

Introduction

Data Manipulation

Predicting Customer Revenue

Customer Segmentation

Predicting Customer Churn

Predicting Customer Lifetime Value (CLV)

Conclusion

Dimensionality Reduction with PCA

Principal component analysis (PCA)