This device is not compatible.
You will learn to:
Clean network traffic by removing redundancies.
Create data visualizations in Python.
Create machine learning based classifiers to detect cyber attacks.
Evaluate the accuracy of machine learning based classifiers.
Skills
Machine Learning
Data Science
Cyber Security
Intrusion Detection
Prerequisites
Basic knowledge of Python programming
Basic understanding of machine learning
Basic knowledge of plotting
Technologies
NumPy
Python
Pandas
Matplotlib
Scikit-learn
Project Description
Cyber attacks are increasingly frequent and sophisticated, making intrusion detection systems critical for network security. Machine learning offers a powerful approach to cyber attack detection by learning patterns from network traffic and automatically flagging malicious activities before they cause damage. This technology powers modern security operations centers, protecting everything from corporate networks to critical infrastructure.
In this project, we'll build intrusion detection classifiers using Python, scikit-learn, and the SIMARGL2021 dataset, a publicly available collection of benign and malicious network traffic. We'll start by exploring and preprocessing the dataset with Pandas and NumPy, analyzing features, identifying attack types, and removing redundant data. Using Matplotlib for data visualization, we'll examine the distribution of attack labels and understand class imbalances. We'll then standardize features, encode categorical variables, and split the data into training and testing sets for model evaluation.
We'll train and compare three machine learning classifiers: Random Forest for ensemble predictions, Decision Tree for interpretable rules, and Gaussian Naive Bayes for probabilistic classification. After training each model, we'll test their intrusion detection accuracy on unseen network traffic and compare performance metrics including classification accuracy, precision, recall, and training time. By the end, you'll have a working cybersecurity machine learning system demonstrating scikit-learn classification, network traffic analysis, feature engineering, multi-class classification, and model comparison applicable to any anomaly detection or security monitoring project.
Project Tasks
1
Data Preprocessing
Task 0: Get Started
Task 1: Import Libraries and Modules
Task 2: Preprocess the Dataset
Task 3: Explore the Dataset
Task 4: Standardize and Encode the Data
Task 5: Separate Labels and Split the Data into Train and Test Subsets
2
Train and Test Random Forest
Task 6: Train the Random Forest Classifier
Task 7: Test the Random Forest Classifier
3
Train and Test Decision Tree
Task 8: Train the Decision Tree Classifier
Task 9: Test the Decision Tree Classifier
4
Train and Test Naive Bayes
Task 10: Train the Naive Bayes Classifier
Task 11: Test the Naive Bayes Classifier
5
Compare Attack Detection Capability
Task 12: Compare the Accuracy and Training Times
Congratulations!
Subscribe to project updates
Atabek BEKENOV
Senior Software Engineer
Pradip Pariyar
Senior Software Engineer
Renzo Scriber
Senior Software Engineer
Vasiliki Nikolaidi
Senior Software Engineer
Juan Carlos Valerio Arrieta
Senior Software Engineer
Relevant Courses
Use the following content to review prerequisites or explore specific concepts in detail.