Goal of the lab

In this lab, we will learn how to use k-Means for cluster analysis, applied to the Dotalicious data set. We are using the k-means function from the built-in stats package. We will also be using several other libraries.

We will be using the caret package for this lab and subsequent labs, which is short for classification and regression training. This is a popular R package that implements a wide range of ML algorithms and processes, including learning, preprocessing, and validation. One advantage of using caret is that it standardizes and automates many phases in the training and testing of ML models, which makes it easy to train and compare different ML models.

While we’ll include an installation step here, we want to still make sure the libraries install correctly because there may be issues with versions and the current installation we have. So, please consult online resources in case of any issue.

Brief refresher

K-means is a simple clustering algorithm that discovers centroids of clusters by iteratively assigning labels to data points until the assignments converge.

Get hands-on with 1200+ tech skills courses.