mk1.tar.gz

data_preprocessing

dataset

decision_tree_classification

feature_extraction

feature_selection

gradient_boost

kmeans

knn_classification

sk_lr_classification

sk_lr_regression

sk_metrics

sk_missing_value

sk_naive_bayes_classification

sk_nn

sk_parameter_search

sk_pca

sk_pipeline

sk_rf

sk_tsne

sk_svm_classification

jupyter_job

python_updated

sk_naive_bayes_classification-copy

Scikit-Learn is a powerful library that provides a handful of supervised and unsupervised learning algorithms. If you’re serious about having a career in machine learning, then scikit-learn is a must know.

In this course, you will start by learning the various built-in datasets that scikit-learn offers, such as iris and mnist. You will then learn about feature engineering and more specifically, feature selection, feature extraction, and dimension reduction.

In the latter half of the course, you will dive into linear and logistic regression where you’ll work through a few challenges to test your understanding. Lastly, you will focus on unsupervised learning and deep learning where you’ll get into k-means clustering and neural networks.

By the end of this course, you will have a great new skill to add to your resume, and you’ll be ready to start working on your own projects that will utilize scikit-learn.

Hands-on Machine Learning with Scikit-Learn

## Why missing values are important

Missing values are very common in real datasets. For different reasons, the datasets contain missing values as blank, `nan`, `inf`, or other specified values. In some cases, some normal values are also considered to be a missing value, such as `0` or `1`. Why do we care about the missing values? 

1. Some algorithms or some implementations can't deal with the missing values. They assume the dataset is complete.
2. The missing values would impact the performance of our model.

In most cases, the first is the main reason.

In some cases, you may think about just dropping the rows or columns with too many missing values. It's a good idea if only a small part of the data is dropped. However, when the dropped data is large, it may bring some other issues. For example, if you drop the whole column, it leads to the loss of information. Another way around this is to `impute` it. `sklearn` provides some functions for missing value imputation.

# Why missing values are important

Missing values are very common in real datasets. For different reasons, the datasets contain missing values as blank, `nan`, `inf`, or other specified values. In some cases, some normal values are also considered to be a missing value, such as `0` or `1`. Why do we care about the missing values? 

1. Some algorithms or some implementations can't deal with the missing values. They assume the dataset is complete.
2. The missing values would impact the performance of our model.

In most cases, the first is the main reason.

In some cases, you may think about just dropping the rows or columns with too many missing values. It's a good idea if only a small part of the data is dropped. However, when the dropped data is large, it may bring some other issues. For example, if you drop the whole column, it leads to the loss of information. Another way around this is to `impute` it. `sklearn` provides some functions for missing value imputation.

In this lesson, let's see how to deal with missing values in sklearn.

Missing Value

Preliminaries

Working with Datasets

Feature Engineering

General Concepts

Linear Regression

Logistic Regression

Support Vector Machine

Tree Model and Ensemble Method

Unsupervised Learning

Deep Learning

Others

What's Next

Missing Value

Why missing values are important