What is sklearn.datasets.load_breast_cancer in Python?

In Python machine learning programming, we have software called scikit-learn. This software contains some small datasets that are very easy to access, one of which is the load_breast_cancer dataset.

Uses

This dataset uses a machine learning algorithm to classify cancer scans as benignnon-cancerous or malignantcancerous.

Parameters

return_X_yboolean: The default value for this parameter is False.

Syntax for loading dataset

from sklearn.datasets import load_breast_cancer

Features

This is a binary classification dataset.

It has no Missing attribute or Null values.

The class distribution is as follows.

212: malignant
357: benign

This is a commonly used dataset. Machine learning papers have also used this dataset to address regression problems.

All the data types are numerical.

Code

Load the dataset:

After we execute the code, we get the following.

data: It is mostly features in the dataset that would help classify a scan as benign or malignant. It can also be called feature data.
key: All the variable data that would help us classify a scan as benign or malignant. It is mostly the key data. For example, the data classifies the scan as benign or malignant by 1 or 0.
target name: Name of the target variable.
feature name: All the features available in this dataset: radius, texture, compactness, concavity, concave points, perimeter, area, smoothness, etc.
DESCR: Data description.
filename: Data is in CSV format.

Free Resources

License: Creative Commons-Attribution-ShareAlike 4.0 (CC-BY-SA 4.0)