What is the sklearn.cluster.Birch() function in Python?
Overview
The Balanced Iterative Reducing and Clustering using Hierarchies, or simply BIRCH clustering method, is used to perform clustering of unlabeled datasets. We use the sklean.cluster.Birch() method to implement the algorithm regarding BIRCH clustering.
It is a memory-efficient and online learning algorithm. It also helps to create the tree data structure. It can be created through the cluster centroids. They can be provided as the input for the AgglomerativeClustering algorithm.
Note: The reason we prefer BIRCH clustering is that the existing clustering algorithms are not efficient and cannot be performed on limited resources (CPU or memory usage).
Syntax
sklearn.cluster.Birch(*,
threshold=0.5,
branching_factor=50,
n_clusters=3,
compute_labels=True,
copy=True
)
Parameters
*: This parameter accepts_n_argument values.threshold: This parameter is float and its default value is0.5. It shows the maximum number of sub-clusters a CF tree can hold in a leaf node.branching factor: This parameter is an integer and its default value is50. It defines the maximum number of CF sub-clusters in each internal node.n_clusters: This parameter is an integer and its default value is3. It is an instance of thesklearn.clustermodel. It shows the number of clusters that have to be returned after the completion of theBIRCHalgorithm. If this parameter is set tonone, the step of clustering will not be performed and the algorithm will return the intermediate clusters.compute_labels: This parameter is boolean and its default value isTrue. It computes labels for each fit.copy: This parameter is boolean and its default value isTrue. This determines whether or not to make a copy of input data.- Set copy=
True: Create a copy of input data and then performBIRCH. - Set copy=
False: PerformBIRCHon input data.
- Set copy=
Return Value
This function returns the number of clusters after the completion of the last clustering step of the BIRCH algorithm.
Example
# Importing the relevant and necessary modules and librariesimport matplotlib.pyplot as pltfrom sklearn.datasets.samples_generator import make_blobsfrom sklearn.cluster import Birch# Generates the 500 samples using the make_blobs functiondataset, clusters = make_blobs(n_samples = 500, centers = 7, cluster_std = 0.65, random_state = 0)# Creates the BIRCH clustering modelmodel = Birch(branching_factor = 60,n_clusters = None, threshold = 1.0)# Data Trainingmodel.fit(dataset)# Predicting the same datapredicted = model.predict(dataset)# Creates a scatter plotplt.scatter(dataset[:, 0], dataset[:, 1], c = predicted, alpha = 0.8)
Explanation
Line#6: We generate 500 samples using themake_blobsmethod.Line#8: We use branching_factor =60and threshold =1.0to create theBIRCHclustering model.Line#11: We fit the model on the training dataset.Line#13: We usemodel.predict(dataset)to predict values based on the trained model.Line#15: We use the trained model to plot the predicted clusters based onBIRCHalgorithm.