This device is not compatible.


Income Classification Using LightGBM and Scikit-Learn

In this project, we’ll build an income classifier using LightGBM to predict if an individual’s income exceeds $50K based on census data. We’ll also analyze and preprocess the data, optimize a LightGBM classifier, and interpret its predictions to gain insights.

Income Classification Using LightGBM and Scikit-Learn

You will learn to:

Load and preprocess data in Python.

Use cross-validation to improve model performance.

Train a LightGBM model.

Explain model predictions using SHAP values.


Machine Learning

Data Visualization

Data Science

Explainable AI


Good understanding of the Python programming language

Basic understanding of machine learning

Basic understanding of scikit-learn classifiers

Hands-on experience with pandas


SHAP logo



LightGBM logo




Project Description

Microsoft released the LightGBM model in 2018. It uses gradient boosting decision trees (GBDT) as the underlying algorithm. GBDT is an ensemble technique that builds multiple decision trees sequentially, with each new tree learning from the errors of the previous ones. It has seen massive success in classification and regression competitions for its speed and accuracy. 

In this project, we’ll build an income classifier using LightGBM and census data. The model will predict whether an individual’s income is >$50K based on features such as age, education, and occupation. We’ll preprocess and visualize the data, tune hyperparameters with RandomizedSearchCV, and evaluate the model using precision, recall, and the F1 score.

Project Tasks



Task 0: Get Started

Task 1: Import Libraries

Task 2: Load the Adult Census Dataset

Task 3: Perform Exploratory Data Analysis


Data Preprocessing

Task 4: Encode the Categorical Features and Target

Task 5: Split Data into Train and Test Sets


Build a LightGBM Model

Task 6: Define a LightGBM Model

Task 7: Evaluate the Baseline Model

Task 8: Perform K-Fold Cross-Validation

Task 9: Define Parameters for Hyperparameter Tuning

Task 10: Use RandomizedSearchCV to Tune Hyperparameters

Task 11: Fit the Best Model on the Full Train Set

Task 12: Make Predictions on the Test Set


Model Evaluation and Interpretation

Task 13: Evaluate the Optimized Model

Task 14: Plot Feature Importance From the LightGBM Model

Task 15: Compute and Visualize the SHAP Values