What is sklearn.datasets.load_breast_cancer in Python?
In Python machine learning programming, we have software called scikit-learn. This software contains some small datasets that are very easy to access, one of which is the load_breast_cancer dataset.
Uses
This dataset uses a machine learning algorithm to classify cancer scans as
Parameters
return_X_yboolean: The default value for this parameter is False.
Syntax for loading dataset
from sklearn.datasets import load_breast_cancer
Features
This is a binary classification dataset.
It has no Missing attribute or Null values.
The class distribution is as follows.
- 212: malignant
- 357: benign
This is a commonly used dataset. Machine learning papers have also used this dataset to address regression problems.
All the data types are numerical.
Code
Load the dataset:
from sklearn.datasets import load_breast_cancerdata = load_breast_cancer()print(data)print(data.keys())
After we execute the code, we get the following.
-
data: It is mostly features in the dataset that would help classify a scan as benign or malignant. It can also be called feature data. -
key: All the variable data that would help us classify a scan as benign or malignant. It is mostly the key data. For example, the data classifies the scan as benign or malignant by 1 or 0. -
target name: Name of the target variable. -
feature name: All the features available in this dataset: radius, texture, compactness, concavity, concave points, perimeter, area, smoothness, etc. -
DESCR: Data description. -
filename: Data is in CSV format.