...

The Dataset and Exploratory Data Analysis

Learn how to do an exploratory data analysis with the breast cancer dataset.

We'll cover the following...

The breast cancer dataset
Basic imports
Load data and EDA

We have learned two models for classification: logistic regression and KNN. According to the no free lunch theorem, we must find the best model for our data.

The breast cancer dataset

Most of the time, benign tumors are not dangerous since they can’t spread throughout the body (benign brain tumors, however, can be life-threatening). They can’t invade neighboring tissue and can be removed with a low risk of growing back. However, benign tumors can have other possible adverse health effects, and through the process of tumor progression, many of their types can turn malignant (cancerous).

Breast cancer is one of the most common cancers in women. The original breast cancer dataset has 569 observations and 30 features (all numeric). The target classes are M (malignant) and B (benign) types of breast cancer, and the class distribution is 212 Malignant (represented by 0) and 357 Benign (represented by 1).

In the dataset given below, there are 10 real-valued features that are computed for each cell nucleus:

Radius: Mean of distances from the center to points on the perimeter.
Texture: Standard deviation of grayscale values.
Perimeter: Total length of a shape’s boundary.
Area: ...

Course Introduction

Linear Regression

Regularization

Bias-Variance Trade-off

Categorical Features

Logistic Regression

Logistic Regression: Titanic Data

Sentiment Analysis Using Multinomial Logistic Regression

Multiclass Classification and Handling Imbalanced Classes

Project: Predicting Chronic Kidney Disease

K-Nearest Neighbors

Implementation of K-Nearest Neighbors

Logistic Regression vs. KNN

Decision Tree Learning

Implement the Decision Tree Classifier from Scratch

Bootstrapping and Confidence Interval

Support Vector Machine

Practice and Comparisons

What's Next?

Appendix

The Dataset and Exploratory Data Analysis

The breast cancer dataset