Machine Learning with Python A Practical Beginners’ Guide by Oliver Theobald-01.png

MLtarball.tar.gz

jupyter1

jupyter2

jupyter3

jupyter4

jupyter5

jupyter6

jupyter7

jupyter8

jupyter9

jupyter10

jupyter11

jupyter12

jupyter13

jupyter14

jupyter15

jupyter16

jupyter17

jupyter18dtextra

jupyter19dt

jupyter20rf

jupyter21gbc

jupyter22gbr

jupyter-temp

pca6

split

lor1

lor2

lor3

lor4

lor5

lor6

lor7

lor8

lor9

svm9

knn9

gbc7

gbr1

gbr2

gbr3

gbr4

gbr5

gbr6

gbr7

gbr8

Mlxtend

This course teaches you how to code basic machine learning models. The content is designed for beginners with general knowledge of machine learning, including common algorithms such as linear regression, logistic regression, SVM, KNN, decision trees, and more. If you need a refresher, we have summarized key concepts from machine learning, and there are overviews of specific algorithms dispersed throughout the course.

A Practical Guide to Machine Learning with Python

## Quick overview 
Preparing data for further processing generally starts by removing variables that aren’t compatible with the chosen algorithm or variables that are deemed less relevant to your target output. Determining which variables to remove from the dataset is generally done using exploratory data analysis and domain knowledge.

Speaking of exploratory data analysis, it is often helpful to start by checking the data type of your variables (i.e., string, Boolean, integer, etc.) and the correlation between variables. Domain knowledge, meanwhile, is useful for spotting duplicate variables, such as country and country code, and eliminating less relevant variables like latitude and longitude.

> **Note:** In Python, variables can be removed from the dataframe using the `del` function alongside the variable name of the dataframe and the title of the column you wish to remove. The column title should be nested inside quotation marks and square brackets, as shown here:

```python
del df['latitude']
del df['longitude']
```
> **Note:** this code example, in addition to other changes made inside your notebook, won’t affect or alter the source file of the dataset. You can even restore variables removed from the development environment by deleting the code's relevant line(s). In fact, it’s common to reverse the removal of features when testing the model using different combinations of variables.

---

# Quick overview 
Preparing data for further processing generally starts by removing variables that aren’t compatible with the chosen algorithm or variables that are deemed less relevant to your target output. Determining which variables to remove from the dataset is generally done using exploratory data analysis and domain knowledge.

Speaking of exploratory data analysis, it is often helpful to start by checking the data type of your variables (i.e., string, Boolean, integer, etc.) and the correlation between variables. Domain knowledge, meanwhile, is useful for spotting duplicate variables, such as country and country code, and eliminating less relevant variables like latitude and longitude.

> **Note:** In Python, variables can be removed from the dataframe using the `del` function alongside the variable name of the dataframe and the title of the column you wish to remove. The column title should be nested inside quotation marks and square brackets, as shown here:

```python
del df['latitude']
del df['longitude']
```
> **Note:** this code example, in addition to other changes made inside your notebook, won’t affect or alter the source file of the dataset. You can even restore variables removed from the development environment by deleting the code's relevant line(s). In fact, it’s common to reverse the removal of features when testing the model using different combinations of variables.

---

This lesson will introduce you to ways of removing redundant or unhelpful data variables.

Data Scrubbing Operation: Removing Variables

Introduction to Course

Introduction to Machine Learning

Exploratory Data Analysis

Data Scrubbing

Pre-Model Algorithms

Split Validation

Model Design

Linear Regression

Logistic Regression

Support Vector Machines

K-Nearest Neighbors

Tree-Based Methods

Conclusion

Appendix

Data Scrubbing Operation: Removing Variables

Quick overview