This device is not compatible.

Income Classification Using LightGBM and Scikit-Learn

PROJECT

Income Classification Using LightGBM and Scikit-Learn

In this project, we’ll build an income classifier using LightGBM to predict if an individual’s income exceeds $50K based on census data. We’ll also analyze and preprocess the data, optimize a LightGBM classifier, and interpret its predictions to gain insights.

You will learn to:

Load and preprocess data in Python.

Use cross-validation to improve model performance.

Train a LightGBM model.

Explain model predictions using SHAP values.

Skills

Machine Learning

Data Visualization

Data Science

Explainable AI

Prerequisites

Good understanding of the Python programming language

Basic understanding of machine learning

Basic understanding of scikit-learn classifiers

Hands-on experience with pandas

Technologies

SHAP

Scipy

LightGBM

Matplotlib

Scikit-learn

Project Description

Microsoft released the LightGBM model in 2018. It uses gradient boosting decision trees (GBDT) as the underlying algorithm. GBDT is an ensemble technique that builds multiple decision trees sequentially, with each new tree learning from the errors of the previous ones. It has seen massive success in classification and regression competitions for its speed and accuracy.

In this project, we’ll build an income classifier using LightGBM and census data. The model will predict whether an individual’s income is >$50K based on features such as age, education, and occupation. We’ll preprocess and visualize the data, tune hyperparameters with RandomizedSearchCV, and evaluate the model using precision, recall, and the F1 score.

Project Tasks

Introduction

Task 0: Get Started

Task 1: Import Libraries

Task 2: Load the Adult Census Dataset

Task 3: Perform Exploratory Data Analysis

Data Preprocessing

Task 4: Encode the Categorical Features and Target

Task 5: Split Data into Train and Test Sets

Build a LightGBM Model

Task 6: Define a LightGBM Model

Task 7: Evaluate the Baseline Model

Task 8: Perform K-Fold Cross-Validation

Task 9: Define Parameters for Hyperparameter Tuning

Task 10: Use RandomizedSearchCV to Tune Hyperparameters

Task 11: Fit the Best Model on the Full Train Set

Task 12: Make Predictions on the Test Set

Model Evaluation and Interpretation

Task 13: Evaluate the Optimized Model

Task 14: Plot Feature Importance From the LightGBM Model

Task 15: Compute and Visualize the SHAP Values

Congratulations!

Subscribe to project updates

Hear what others have to say

Join 1.4 million developers working at companies like

"Another great hands on project to apply your knowledge learned. Thank you Educative ❤️"

Atabek BEKENOV

Senior Software Engineer

"Super excited to learn E-commerce website for my own startup venture. Thanks for your great learning platform."

Pradip Pariyar

Senior Software Engineer

"This was an excellent lesson. I learned a lot working through the process. I enjoyed it so much that I rebuilt it my AWS account to see how hard it would be to deploy to a production environment."

Renzo Scriber

Senior Software Engineer

"It was my first proper data engineering project and it was amazing."

Vasiliki Nikolaidi

Senior Software Engineer

"It's a fantastic way to do hands-on practice; I enjoy this way of learning."

Juan Carlos Valerio Arrieta

Senior Software Engineer

Relevant Courses

Use the following content to review prerequisites or explore specific concepts in detail.