What is the sklearn.cluster.Birch() function in Python?

Overview

The Balanced Iterative Reducing and Clustering using Hierarchies, or simply BIRCH clustering method, is used to perform clustering of unlabeled datasets. We use the sklean.cluster.Birch() method to implement the algorithm regarding BIRCH clustering.

It is a memory-efficient and online learning algorithm. It also helps to create the tree data structure. It can be created through the cluster centroids. They can be provided as the input for the AgglomerativeClustering algorithm.

Note: The reason we prefer BIRCH clustering is that the existing clustering algorithms are not efficient and cannot be performed on limited resources (CPU or memory usage).

Syntax

sklearn.cluster.Birch(*,
    threshold=0.5,
    branching_factor=50,
    n_clusters=3,
    compute_labels=True,
    copy=True
  )

Parameters

*: This parameter accepts _n_ argument values.
threshold: This parameter is float and its default value is 0.5. It shows the maximum number of sub-clusters a CF tree can hold in a leaf node.
branching factor: This parameter is an integer and its default value is 50. It defines the maximum number of CF sub-clusters in each internal node.
n_clusters: This parameter is an integer and its default value is 3. It is an instance of the sklearn.cluster model. It shows the number of clusters that have to be returned after the completion of the BIRCH algorithm. If this parameter is set to none, the step of clustering will not be performed and the algorithm will return the intermediate clusters.
compute_labels: This parameter is boolean and its default value is True. It computes labels for each fit.
copy: This parameter is boolean and its default value is True. This determines whether or not to make a copy of input data.
- Set copy=True: Create a copy of input data and then perform BIRCH.
- Set copy=False: Perform BIRCH on input data.

Return Value

This function returns the number of clusters after the completion of the last clustering step of the BIRCH algorithm.

Example

# Importing the relevant and necessary modules and libraries
import matplotlib.pyplot as plt
from sklearn.datasets.samples_generator import make_blobs
from sklearn.cluster import Birch
# Generates the 500 samples using the make_blobs function
dataset, clusters = make_blobs(n_samples = 500, centers = 7, cluster_std = 0.65, random_state = 0)
# Creates the BIRCH clustering model
model = Birch(branching_factor = 60,
             n_clusters = None, threshold = 1.0)
# Data Training
model.fit(dataset)
# Predicting the same data
predicted = model.predict(dataset)
# Creates a scatter plot
plt.scatter(dataset[:, 0], dataset[:, 1], c = predicted, alpha = 0.8)

Free Resources

License: Creative Commons-Attribution-ShareAlike 4.0 (CC-BY-SA 4.0)