Dimensionality Reduction with PCA
Explore Principal Component Analysis (PCA) to reduce dimensionality by combining correlated features into uncorrelated components. Learn to select components that capture most of the data variance for better customer segmentation using k-means clustering. Understand how PCA helps simplify data and improve clustering performance for marketing analytics.
We'll cover the following...
Principal component analysis (PCA)
Redundant information can skew the model outcome if we have a dataset with highly correlated features. This is known as the multicollinearity problem. Using Principal Component Analysis (PCA), we can reduce the number of attributes without losing the original information.
PCA is a data transformation technique that combines existing features into new components to maximize data variance. PCA also makes these components independent of each other (minimizing correlation) and ranks them based on their contribution factor. Later, we can select a subset of transformed features (components) that represent most of the data variance.
Let’s assume we have a dataset with two features (feature 1 and feature 2). PCA tries to fit these two features and calculates the first component in such a way that the variance is maximum and the sum of squared ...