What is sklearn.datasets.fetch_olivetti_faces() in Python?

fetch_olivetti_faces() from the sklearn.dataset module is used to load or fetch Olivetti facesData archive by AT&T dataset in your program. This dataset is specialized for classification problems.

Dataset overview

There are 10 different 64x64 images in Olivetti faces dataset each of which has 40 distinct subjects or classes. The subjects are lighting conditions, facial expressions etc. Moreover, it contains 400 samples (no_of_images*classes) with dimensionality of 4096. The target or predicting values are integers between 0 & 39 indicating the identity of the person.

Syntax

sklearn.datasets.fetch_olivetti_faces(*,
   data_home=None,
   shuffle=False,
   random_state=0,
   download_if_missing=True,
   return_X_y=False
 )

Parameters

data_home: This parameter is of type str. It helps to specify another cache and download the folder for datasets.
shuffle: This parameter is of type bool and its default value is False. If we set its value as True, the order of the images will shuffle to avoid the same images being grouped.
random_state: Its default type is int and its default value is 0. It will define the random number to shuffle the dataset.
download_if_missing: Its type is bool and its default type is True. If it is False then IOError will occur. This error will be raised if the data is not available locally.
return_X_y: This parameter has the type bool and its default value is False. The data and target objects (data, target) will be returned instead of the Bunch object if it is true.

Return value

Data: It is a dictionary-like object with multiple attributes like:

data: ndarray, shape (400, 4096), etc. Every row of this attribute is parallel to the image having the original size of 64X64 pixels.
target: array of shape (400): The labels are related to every face image. These labels have a range from 0 to 30. They correspond to the subject IDs.
images: ndarray, shape (400,64,64): Every row is a face image that is parallel to one of the 40 subjects or classes of the dataset.
DESC: It shows the description of the modified Olivetti Faces Dataset.
(data, target): Tuple if the return_X_y is set as True.

Note: shape(400) shows a one dimensional array of labels.

Explanation

The code mentioned below helps to understand the working of the fetch_olivetti_faces(*[, …]) method.

# Load useful libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_olivetti_faces
# To check RGB Images for dim
def is_colored(image):
    # Check for three channels
    if len(image.shape) == 3:
        R, G, B = image[:, : , 0], image[:, :, 1], image[:, :, 2]
        if (R == G).all() and (G == B).all():
            return True
    return False
# method to show images as grid
def show_images(images, grid=True, total_cols=2, figsize=(30, 20)):
    assert len(images) > 0
    assert isinstance(images[0], np.ndarray)
    # extracting length of images i.e 6
    totalImages  = len(images)
    total_cols    = min(totalImages, total_cols)
    total_rows    = int(totalImages / total_cols) + (1 if totalImages % total_cols != 0 else 0)
    # Create a grid of subplots.
    fig, axes = plt.subplots(total_rows, total_cols, figsize=figsize)
    # Create list of axes for easy iteration.
    if isinstance(axes, np.ndarray):
        list_axes = list(axes.flat)
    else:
        list_axes = [axes]
    # it will helps to show total images as grid 
    for i in range(totalImages):
        img    = images[i]
        list_axes[i].imshow(img, cmap='gray')
        list_axes[i].grid(grid)
    for i in range(totalImages, len(list_axes)):
        list_axes[i].set_visible(False)
# loading dataset
image_data = fetch_olivetti_faces()
# creating list of 6 images
images = [image_data.images[0], image_data.images[1], image_data.images[2],image_data.images[3],image_data.images[4],image_data.images[5]]
# Using show_images method to display images
show_images(images, figsize=(30, 20))

Free Resources

License: Creative Commons-Attribution-ShareAlike 4.0 (CC-BY-SA 4.0)

Type	Description
Classes	40
Total samples	400
Dimensionality	4096
Features	real values between 0 and 1