Trusted answers to developer questions

What is the sklearn.cluster.Birch() function in Python?

Get Started With Machine Learning

Learn the fundamentals of Machine Learning with this free course. Future-proof your career by adding ML skills to your toolkit — or prepare to land a job in AI or Data Science.

Overview

The Balanced Iterative Reducing and Clustering using Hierarchies, or simply BIRCH clustering method, is used to perform clustering of unlabeled datasets. We use the sklean.cluster.Birch() method to implement the algorithm regarding BIRCH clustering.

It is a memory-efficient and online learning algorithm. It also helps to create the tree data structure. It can be created through the cluster centroids. They can be provided as the input for the AgglomerativeClustering algorithm.

Note: The reason we prefer BIRCH clustering is that the existing clustering algorithms are not efficient and cannot be performed on limited resources (CPU or memory usage).

Syntax


sklearn.cluster.Birch(*,
    threshold=0.5,
    branching_factor=50,
    n_clusters=3,
    compute_labels=True,
    copy=True
  )

Parameters

  • *: This parameter accepts _n_ argument values.
  • threshold: This parameter is float and its default value is 0.5. It shows the maximum number of sub-clusters a CF tree can hold in a leaf node.
  • branching factor: This parameter is an integer and its default value is 50. It defines the maximum number of CF sub-clusters in each internal node.
  • n_clusters: This parameter is an integer and its default value is 3. It is an instance of the sklearn.cluster model. It shows the number of clusters that have to be returned after the completion of the BIRCH algorithm. If this parameter is set to none, the step of clustering will not be performed and the algorithm will return the intermediate clusters.
  • compute_labels: This parameter is boolean and its default value is True. It computes labels for each fit.
  • copy: This parameter is boolean and its default value is True. This determines whether or not to make a copy of input data.
    • Set copy=True: Create a copy of input data and then perform BIRCH.
    • Set copy=False: Perform BIRCH on input data.

Return Value

This function returns the number of clusters after the completion of the last clustering step of the BIRCH algorithm.

Example

# Importing the relevant and necessary modules and libraries
import matplotlib.pyplot as plt
from sklearn.datasets.samples_generator import make_blobs
from sklearn.cluster import Birch
# Generates the 500 samples using the make_blobs function
dataset, clusters = make_blobs(n_samples = 500, centers = 7, cluster_std = 0.65, random_state = 0)
# Creates the BIRCH clustering model
model = Birch(branching_factor = 60,
n_clusters = None, threshold = 1.0)
# Data Training
model.fit(dataset)
# Predicting the same data
predicted = model.predict(dataset)
# Creates a scatter plot
plt.scatter(dataset[:, 0], dataset[:, 1], c = predicted, alpha = 0.8)

Explanation

  • Line#6: We generate 500 samples using the make_blobs method.
  • Line#8: We use branching_factor = 60 and threshold = 1.0 to create the BIRCH clustering model.
  • Line#11: We fit the model on the training dataset.
  • Line#13: We use model.predict(dataset) to predict values based on the trained model.
  • Line#15: We use the trained model to plot the predicted clusters based on BIRCH algorithm.

RELATED TAGS

python
Did you find this helpful?