Color Spaces

Color space refers to the internal representation of colors in an image. The human eye perceives color as a mixture of signals from three types of sensors in the retina, tuned to a narrow spectral band. In a digital image, the color of a pixel gets encoded as a triple of numbers. That’s why the shape of a color image is (H, W, C) and the number of channels (C) is 3.

The interpretation of these three numbers differs from one color space to another. We can think of it as analogous to the two numbers used as coordinates of a point in a plane. Cartesian coordinates interpret the coordinates as the distances from a given point to the two reference axes. In polar coordinates, the first coordinate represents the distance from a given point to the origin. The second coordinate represents the angle formed by the point’s radius from $(0, 0)$ and one of the reference axes. The same goes for color spaces. They encode color as a triple of numbers whose signification or ordering varies from one space to another. This lesson will focus on the color spaces most often encountered in automated inspection projects.

BGR and RGB, the most common color spaces, encode colors as cartesian coordinates in a 3D space.

Press + to interact

In lines 4–6, we extract the channels 0, 1, and 2. The correspond respectively to the blue, green, and red channels.

As expected, because the letters in the original image are purely blue, green, or red, the three channels have values of 255 in the area of the letter and zero everywhere else. The individual channels are saved as monochrome images, so the letters appear white.

The OpenCV library uses BGR by default, but we can interact with another image library that expects RGB images. To convert from BGR to RGB, we can swap the blue and red channels:

Press + to interact

In line 4, we create an RGB image by calling cv2.cvtColor() with the code=cv2.COLOR_BGR2RGB argument.

Monochrome, a.k.a grayscale

A special case often encountered in automated inspection is when an image has a single channel. We say that such an image is monochrome or grayscale. We can convert a color image to a grayscale image with cv2.cvtColor(), passing the code=cv2.COLOR_BGR2GRAY argument.

The conversion from color to grayscale is a linear combination of the three channels, weighted such that the human eye perceives approximately the same level of contrast.

Press + to interact

In line 5, we create a grayscale image by calling cv2.cvtColor() with the code=cv2.COLOR_BGR2GRAY argument.

Although the three letters were saturated (i.e., they had a value of 255 in their respective channels), the letter “G” is brighter in the grayscale image. This feature reflects the human eye’s greater sensitivity to green light.

The shape of the grayscale image is (152, 418), while that of the color image is (152, 418, 3). We see that the conversion to grayscale made our image single-channel. The shape (152, 418) is the squeezed representation of (152, 418, 1).

As you can guess, we lost information in the process. If we convert back our grayscale image to BGR, we get a three-channel image whose (b, g, r) planes are copies of each other. The color image looks just like the grayscale image.

Press + to interact

Introduction

Getting Started with Images

Image I/O and Annotations

Color Spaces and Thresholding

Convert Color Spaces, Threshold

Smoothing and Masking

Detection of Features

Image Registration

3D Vision

Getting Started with Neural Networks

Convolutional Neural Networks

Project: Create and Train a CNN for Classification

Object Detection and Semantic Segmentation

Dataset Annotation

Final Remarks

BGR and RGB

Monochrome, a.k.a grayscale

Question

HLS

Question