Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

mllib

What is PySpark MLlib?

AKASH BAJWA

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

MLlib is a machine learning API offered by Apache Spark. In Python, we can also use this API through the PySpark framework. It has numerous machine learning algorithms, which are either supervised or unsupervised. In this shot, we list some renowned classes from MLlib.

The spark.mllib Library

This algorithm uses the method of model-based collaborative filtering.  The goal of this library is to make practical machine learning adaptable and easy. These latent factors can be learned by using the ALSAlternating Least Squaring algorithm.

Spark logo.

The mllib.classification module

The spark.mllib package supports different methods for binary and multiclass classifications. It also supports regression analysis. Some common algorithms regarding MLlib classification are as follows:

  • Random Forest
  • Naïve Bayes
  • Decision trees

The mllib.clustering module

This method is an unsupervised learning technique in machine learning. In this method, our goal is to group subsets of entities with each other based on similarities among them. We can use multiple algorithms to do this. Here are some of the most commonly used algorithms:

  • K-Means (Euclidean distance, Manhattan distance)
  • Agglomerative Clustering
  • BIRCH
  • OPTICS

The mllib.regression module

Linear regression is also a part of regression algorithms. Regression aims to find out the relations and dependencies among variables. Linear regression works similarly to logistic regression.

The mllib.recommendation module

In recommender systems, the most commonly used method is collaborative filtering. MLlib implements alternating least squares or cosine similarity algorithms for collaborative filtering to make recommendations.

The mllib.linalg module

The mllib.linalg module has some predefined methods to perform linear algebra operations on data. It helps us perform data analysis and allows us to measure the machine learning model's accuracy, integrity, and so on. It contains arrays, matrixes, vectors, and some operations related to linear algebra.

The mllib.fpm module

The fpm method—short for frequent pattern matching—helps us mine frequent items, item sets, and subsequences. This process is often the first step in the examination of large-scale datasets.

Besides the ones listed above, various other algorithms are part of PySpark MLlib.

RELATED TAGS

mllib

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

Keep Exploring