...

Classification with PySpark MLlib

Learn how to use the logistic regression algorithm using PySpark MLlib.

We'll cover the following...

Logistic regression

In machine learning, classification models play a crucial role when the task is to predict distinct categories or classes.

Logistic regression

One of the most commonly used algorithms for classification is logistic regression. It predicts the probability of different outcomes, making it an ideal choice for scenarios where we want to determine the likelihood of an event occurring or not. In PySpark MLlib, logistic regression can be used for binary classification (two distinct classes) using binomial logistic regression or for multiclass classification (more than two classes) using multinomial logistic regression. The specific variant is determined by setting the family parameter accordingly or leaving it unset, and Spark can infer it if the parameter is left unset.

Multinomial logistic regression can be used for binary classification by setting the family parameter to “multinomial.” The output will be two sets of coefficients and two intercepts.

Note: During logistic regression fitting without intercept on a dataset with the constant nonzero column, PySpark MLlib outputs zero coefficients for constant nonzero columns. ...

Introduction to the Course

Introduction to Big Data

Exploring PySpark Core and RDDs

PySpark DataFrames and SQL

Customer Churn Analysis Using PySpark

Machine Learning with PySpark

Modeling with PySpark MLlib

Predicting Diabetes in Patients Using PySpark MLlib

Performance Optimization in PySpark

PySpark Optimization: Analyzing NYC Restaurants Data

Integrating PySpark with Other Big Data Tools

Wrap Up

Apriori Algorithm for Finding Frequent Itemsets with PySpark

Classification with PySpark MLlib

Logistic regression