How to use the fetch_20newsgroups() function
Overview
The 20 newsgroups dataset is used in classification problems. The fetch_20newsgroups() function allows the loading of filenames and data from the 20 newsgroups dataset. It has 20 classes, 18846 observations, and features in the form of strings.
It downloads the dataset from the original 20 newsgroups website and caches it locally.
Syntax
sklearn.datasets.fetch_20newsgroups(*,data_home=None,subset='train',categories=None,shuffle=True,random_state=42,remove=(),download_if_missing=True,return_X_y=False)
Parameters
It takes the following argument values:
data_home: This is the directory to download/cache the dataset. By default, it's'~/scikit_learn_data'.subset: This partially selects dataset as segments like train, test, or all. By default, its value is'train'.categories: IfNone, it loads all the categories of the dataset. Otherwise, it requires a list of categories to load.shuffle: Its default value isTrue, it shows whether or not to shuffle this dataset when loading into the program.download_if_missing: Its default value isTrue. If set toFalse, it instructs not to download the dataset locally if it's missing.
Return value
It returns a dictionary-like object, bunch-object.
Example
from sklearn.datasets import fetch_20newsgroupsimport pandas as pd# fetch 20 newsgroups datasetdata= fetch_20newsgroups()# print dataset on consoleprint(data)
Explanation
- Line 1–2: We load the
fetch_20newsgroups()method from thesklearn.datasetsmodule. - Line 5: We invoke the
fetch_20newsgroups()method to load 20newsgroupsdataset into the program. - Line 7: We print the loaded dataset to the console.