The Balanced Iterative Reducing and Clustering using Hierarchies, or simply BIRCH
clustering method, is used to perform clustering of unlabeled datasets. We use the sklean.cluster.Birch()
method to implement the algorithm regarding BIRCH
clustering.
It is a memory-efficient and online learning algorithm. It also helps to create the tree data structure. It can be created through the cluster centroids. They can be provided as the input for the AgglomerativeClustering
algorithm.
Note: The reason we prefer BIRCH clustering is that the existing clustering algorithms are not efficient and cannot be performed on limited resources (CPU or memory usage).
sklearn.cluster.Birch(*,
threshold=0.5,
branching_factor=50,
n_clusters=3,
compute_labels=True,
copy=True
)
*
: This parameter accepts _n_
argument values.threshold
: This parameter is float and its default value is 0.5
. It shows the maximum number of sub-clusters a CF tree can hold in a leaf node.branching factor
: This parameter is an integer and its default value is 50
. It defines the maximum number of CF sub-clusters in each internal node.n_clusters
: This parameter is an integer and its default value is 3
. It is an instance of the sklearn.cluster
model. It shows the number of clusters that have to be returned after the completion of the BIRCH
algorithm. If this parameter is set to none
, the step of clustering will not be performed and the algorithm will return the intermediate clusters.compute_labels
: This parameter is boolean and its default value is True
. It computes labels for each fit.copy
: This parameter is boolean and its default value is True
. This determines whether or not to make a copy of input data.
True
: Create a copy of input data and then perform BIRCH
.False
: Perform BIRCH
on input data.This function returns the number of clusters after the completion of the last clustering step of the BIRCH
algorithm.
# Importing the relevant and necessary modules and librariesimport matplotlib.pyplot as pltfrom sklearn.datasets.samples_generator import make_blobsfrom sklearn.cluster import Birch# Generates the 500 samples using the make_blobs functiondataset, clusters = make_blobs(n_samples = 500, centers = 7, cluster_std = 0.65, random_state = 0)# Creates the BIRCH clustering modelmodel = Birch(branching_factor = 60,n_clusters = None, threshold = 1.0)# Data Trainingmodel.fit(dataset)# Predicting the same datapredicted = model.predict(dataset)# Creates a scatter plotplt.scatter(dataset[:, 0], dataset[:, 1], c = predicted, alpha = 0.8)
Line#6
: We generate 500 samples using the make_blobs
method.Line#8
: We use branching_factor = 60
and threshold = 1.0
to create the BIRCH
clustering model.Line#11
: We fit the model on the training dataset.Line#13
: We use model.predict(dataset)
to predict values based on the trained model.Line#15
: We use the trained model to plot the predicted clusters based on BIRCH
algorithm.RELATED TAGS
CONTRIBUTOR