This device is not compatible.
PROJECT
Text Classification Using PyTorch
In this project, we will learn how to build a deep-learning-based classifier using PyTorch. We will learn about text preprocessing, feature extraction, model selection, training, and evaluation. We will use classical Python NLP libraries such as NLTK and explore traditional machine learning algorithms such as XGBoost in addition to the neural networks.
You will learn to:
Clean and extract features from text.
Build and train machine learning and deep learning models.
Use contextualized embeddings and pretrained language models.
Handle imbalanced data effectively.
Skills
Natural Language Processing
Neural Networks
Machine Learning Fundamentals
Deep Learning
Transformer Models
Prerequisites
Intermediate knowledge of Python programming language
Basic knowledge of pandas library
Basic knowledge of machine learning paradigms and techniques
Basic knowledge of PyTorch framework
Technologies
NLTK
Pandas
XGBoost
PyTorch
Scikit-learn
Project Description
Text classification is a fundamental task in natural language processing (NLP) that aims to categorize text documents into predefined classes or categories automatically. It has numerous real-world applications, such as sentiment analysis, spam detection, topic classification, customer feedback analysis, and currently, classifying text as generated by an AI model or not.
In this project, we’ll practice preprocessing text data, extracting meaningful features, and training machine learning models to perform classification. Specifically, we’ll build a question classifier. The project emphasizes the use of neural networks, including pre-trained language models, while also providing an introduction to traditional machine learning techniques. We’ll use popular Python NLP libraries and frameworks like NLTK, scikit-learn, and PyTorch.
Project Tasks
1
Introduction
Task 0: Get Started
Task 1: Import Libraries and Explore Datasets
2
Data Preparation and Basic Feature Engineering
Task 2: Preprocess Text
Task 3: Split the Data
Task 4: Extract Features (BoW)
Task 5: Extract Features (TF-IDF)
3
Linear and Tree Models
Task 6: Train a Linear Model
Task 7: Tune Hyperparameters
Task 8: Train an Ensemble Model
Task 9: Evaluate the Model
4
Neural Networks
Task 10: Define a Neural Network
Task 11: Create Datasets and DataLoaders
Task 12: Set Up Training
Task 13: Train and Evaluate the Neural Network
Task 14: Get Word Embeddings
Task 15: Set Up Training
Task 16: Train and Evaluate the Neural Network
Task 17: Get Embeddings from Pretrained Language Models
Task 18: Set Up Training
Task 19: Train and Evaluate the Neural Network
5
Data Imbalance
Task 20: Handle Imbalanced Data
Task 21: Train and Evaluate the Neural Network
Task 22: Save a Neural Network
Congratulations!