This device is not compatible.
PROJECT
Income Classification Using LightGBM and Scikit-Learn
In this project, we’ll build an income classifier using LightGBM to predict if an individual’s income exceeds $50K based on census data. We’ll also analyze and preprocess the data, optimize a LightGBM classifier, and interpret its predictions to gain insights.
You will learn to:
Load and preprocess data in Python.
Use cross-validation to improve model performance.
Train a LightGBM model.
Explain model predictions using SHAP values.
Skills
Machine Learning
Data Visualization
Data Science
Explainable AI
Prerequisites
Good understanding of the Python programming language
Basic understanding of machine learning
Basic understanding of scikit-learn classifiers
Hands-on experience with pandas
Technologies
SHAP
Scipy
LightGBM
Matplotlib
Scikit-learn
Project Description
Microsoft released the LightGBM model in 2018. It uses gradient boosting decision trees (GBDT) as the underlying algorithm. GBDT is an ensemble technique that builds multiple decision trees sequentially, with each new tree learning from the errors of the previous ones. It has seen massive success in classification and regression competitions for its speed and accuracy.
In this project, we’ll build an income classifier using LightGBM and census data. The model will predict whether an individual’s income is >$50K based on features such as age, education, and occupation. We’ll preprocess and visualize the data, tune hyperparameters with RandomizedSearchCV, and evaluate the model using precision, recall, and the F1 score.
Project Tasks
1
Introduction
Task 0: Get Started
Task 1: Import Libraries
Task 2: Load the Adult Census Dataset
Task 3: Perform Exploratory Data Analysis
2
Data Preprocessing
Task 4: Encode the Categorical Features and Target
Task 5: Split Data into Train and Test Sets
3
Build a LightGBM Model
Task 6: Define a LightGBM Model
Task 7: Evaluate the Baseline Model
Task 8: Perform K-Fold Cross-Validation
Task 9: Define Parameters for Hyperparameter Tuning
Task 10: Use RandomizedSearchCV to Tune Hyperparameters
Task 11: Fit the Best Model on the Full Train Set
Task 12: Make Predictions on the Test Set
4
Model Evaluation and Interpretation
Task 13: Evaluate the Optimized Model
Task 14: Plot Feature Importance From the LightGBM Model
Task 15: Compute and Visualize the SHAP Values
Congratulations!
Atabek BEKENOV
Senior Software Engineer
Pradip Pariyar
Senior Software Engineer
Renzo Scriber
Senior Software Engineer
Vasiliki Nikolaidi
Senior Software Engineer
Juan Carlos Valerio Arrieta
Senior Software Engineer
Relevant Courses
Use the following content to review prerequisites or explore specific concepts in detail.