Machine Learning with Python A Practical Beginners’ Guide by Oliver Theobald-01.png

MLtarball.tar.gz

jupyter1

jupyter2

jupyter3

jupyter4

jupyter5

jupyter6

jupyter7

jupyter8

jupyter9

jupyter10

jupyter11

jupyter12

jupyter13

jupyter14

jupyter15

jupyter16

jupyter17

jupyter18dtextra

jupyter19dt

jupyter20rf

jupyter21gbc

jupyter22gbr

jupyter-temp

pca6

split

lor1

lor2

lor3

lor4

lor5

lor6

lor7

lor8

lor9

svm9

knn9

gbc7

gbr1

gbr2

gbr3

gbr4

gbr5

gbr6

gbr7

gbr8

Mlxtend

This course teaches you how to code basic machine learning models. The content is designed for beginners with general knowledge of machine learning, including common algorithms such as linear regression, logistic regression, SVM, KNN, decision trees, and more. If you need a refresher, we have summarized key concepts from machine learning, and there are overviews of specific algorithms dispersed throughout the course.

A Practical Guide to Machine Learning with Python

## 4) Scale data
Next, you will import the Scikit-learn function `StandardScaler`, which standardizes features by using zero as the mean for all variables and scaling to unit variance. The mean and standard deviation are then stored and used later with the `transform` method, which recreates the data frame with the requested transformed values.

After importing `StandardScaler`, you can assign it as a new variable, fit the function to the features contained in the data frame, and transform those values under a new variable name.

StandardScaler is often used in conjunction with PCA and other algorithms, including k-nearest neighbors and support vector machines, to rescale and standardize data features. In concert, they can, for example, give a dataset the properties of a standard normal distribution with a mean of zero and a standard deviation of one.

Without standardization, the PCA algorithm is likely to lock onto features that maximize variance. Another factor may exaggerate that, however. Notice that the variance of `Age` changes dramatically when measured in days rather than in years. If left unchecked, this type of formatting might mislead the selection of components which is based on maximizing variance. `StandardScaler` helps to avoid this problem by rescaling and standardizing variables.

Conversely, standardization might not be necessary for PCA if the scale of the variables is relevant to your analysis or consistent across variables. Further information regarding `StandardScaler` can be found [here](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html).



# 4) Scale data
Next, you will import the Scikit-learn function `StandardScaler`, which standardizes features by using zero as the mean for all variables and scaling to unit variance. The mean and standard deviation are then stored and used later with the `transform` method, which recreates the data frame with the requested transformed values.

After importing `StandardScaler`, you can assign it as a new variable, fit the function to the features contained in the data frame, and transform those values under a new variable name.

StandardScaler is often used in conjunction with PCA and other algorithms, including k-nearest neighbors and support vector machines, to rescale and standardize data features. In concert, they can, for example, give a dataset the properties of a standard normal distribution with a mean of zero and a standard deviation of one.

Without standardization, the PCA algorithm is likely to lock onto features that maximize variance. Another factor may exaggerate that, however. Notice that the variance of `Age` changes dramatically when measured in days rather than in years. If left unchecked, this type of formatting might mislead the selection of components which is based on maximizing variance. `StandardScaler` helps to avoid this problem by rescaling and standardizing variables.

Conversely, standardization might not be necessary for PCA if the scale of the variables is relevant to your analysis or consistent across variables. Further information regarding `StandardScaler` can be found [here](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html).



We will continue to steps 4-6 of the principal component analysis.

PCA Implementation Steps: 4 to 6

Introduction to Course

Introduction to Machine Learning

Exploratory Data Analysis

Data Scrubbing

Pre-Model Algorithms

Split Validation

Model Design

Linear Regression

Logistic Regression

Support Vector Machines

K-Nearest Neighbors

Tree-Based Methods

Conclusion

Appendix

PCA Implementation Steps: 4 to 6

4) Scale data