Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

sklearn
communitycreator

What is sklearn.datasets.fetch_olivetti_faces() in Python?

Salman Yousaf

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

fetch_olivetti_faces() from the sklearn.dataset module is used to load or fetch Olivetti facesData archive by AT&T dataset in your program. This dataset is specialized for classification problems.

Dataset overview

There are 10 different 64x64 images in Olivetti faces dataset each of which has 40 distinct subjects or classes. The subjects are lighting conditions, facial expressions etc. Moreover, it contains 400 samples (no_of_images*classes) with dimensionality of 4096. The target or predicting values are integers between 0 & 39 indicating the identity of the person.

Details

Type

Description

Classes

40

Total samples

400

Dimensionality

4096

Features

real values between 0 and 1

Syntax


sklearn.datasets.fetch_olivetti_faces(*,
   data_home=None,
   shuffle=False,
   random_state=0,
   download_if_missing=True,
   return_X_y=False
 )

Parameters

  • data_home: This parameter is of type str. It helps to specify another cache and download the folder for datasets.
  • shuffle: This parameter is of type bool and its default value is False. If we set its value as True, the order of the images will shuffle to avoid the same images being grouped.
  • random_state: Its default type is int and its default value is 0. It will define the random number to shuffle the dataset.
  • download_if_missing: Its type is bool and its default type is True. If it is False then IOError will occur. This error will be raised if the data is not available locally.
  • return_X_y: This parameter has the type bool and its default value is False. The data and target objects (data, target) will be returned instead of the Bunch object if it is true.

Return value

Data: It is a dictionary-like object with multiple attributes like:

  • data: ndarray, shape (400, 4096), etc. Every row of this attribute is parallel to the image having the original size of 64X64 pixels.

  • target: array of shape (400): The labels are related to every face image. These labels have a range from 0 to 30. They correspond to the subject IDs.

  • images: ndarray, shape (400,64,64): Every row is a face image that is parallel to one of the 40 subjects or classes of the dataset.

  • DESC: It shows the description of the modified Olivetti Faces Dataset.

  • (data, target): Tuple if the return_X_y is set as True.

Note: shape(400) shows a one dimensional array of labels.

Explanation

The code mentioned below helps to understand the working of the fetch_olivetti_faces(*[, …]) method.

# Load useful libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_olivetti_faces
# To check RGB Images for dim
def is_colored(image):
# Check for three channels
if len(image.shape) == 3:
R, G, B = image[:, : , 0], image[:, :, 1], image[:, :, 2]
if (R == G).all() and (G == B).all():
return True
return False
# method to show images as grid
def show_images(images, grid=True, total_cols=2, figsize=(30, 20)):
assert len(images) > 0
assert isinstance(images[0], np.ndarray)
# extracting length of images i.e 6
totalImages = len(images)
total_cols = min(totalImages, total_cols)
total_rows = int(totalImages / total_cols) + (1 if totalImages % total_cols != 0 else 0)
# Create a grid of subplots.
fig, axes = plt.subplots(total_rows, total_cols, figsize=figsize)
# Create list of axes for easy iteration.
if isinstance(axes, np.ndarray):
list_axes = list(axes.flat)
else:
list_axes = [axes]
# it will helps to show total images as grid
for i in range(totalImages):
img = images[i]
list_axes[i].imshow(img, cmap='gray')
list_axes[i].grid(grid)
for i in range(totalImages, len(list_axes)):
list_axes[i].set_visible(False)
# loading dataset
image_data = fetch_olivetti_faces()
# creating list of 6 images
images = [image_data.images[0], image_data.images[1], image_data.images[2],image_data.images[3],image_data.images[4],image_data.images[5]]
# Using show_images method to display images
show_images(images, figsize=(30, 20))
Demo Code
  • Lines 7-13: These lines of code help to tackle colored images.
  • Lines 15-36: show_images() method will print a list of images as a grid.
  • Line 39: Fetching Olivetti faces data set to image_data from AT&T archives.
  • Line 41: Creates a list of 6 images as the images variable.
  • Line 43: Showing images as grid and calling show_images() method.

RELATED TAGS

sklearn
communitycreator

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

Keep Exploring