What is SIFT?

Overview

Scale Invariant Feature Transform (SIFT) was introduced by D. Lowe, a former professor at the University of British Columbia, in the year 2004. SIFT is a feature extraction method that reduces the image content to a set of points used to detect similar patterns in other images. This algorithm is usually related to computer vision applications, including image matching and object detection.

Key terminologies

Feature Extraction: These methods aim to reduce the number of features in the dataset by passing them through a mapping function.
Key points: An image's key points are spatial locations that are rotation and scale-invariant. These key points highlight what stands out in an image and which pixels are of utmost importance.
Descriptors: Descriptors are vectors that describe the local surroundings around the key points present in the image. These descriptors are used to make associations between different images.
Gaussian Blur: This is a method used to reduce the noise in the image, with the help of the Gaussian function, so that the key points can be detected efficiently.

This method is repeated for all four octaves, and the resultant images are then used to find key points.

Key point localization

After our images are processed, we must locate the key points in the image. This is done by comparing the pixel values with other pixels in their locality. So, each pixel value is now compared with all 8 neighboring pixels along with 9 pixels in the different scales as defined in the octaves. So, a total of 26 comparisons are made to check if a point is a local extremum, and only then is it classified as a potential key point.

However, this process results in a lot of key points being generated. So, we drop the key points which do not have enough contrast or are lying along an edge in our test image. This way we get a set of legitimate key points for our test image.

Orientation assignment

To make the set of legitimate key points invariant to rotation, we must assign them a magnitude (intensity of a pixel) and orientation (direction of the pixel). The formula used for calculating the magnitude and orientation is as follows:

As seen above, the histogram will have a peak value. The peak value represents the orientation of a key point in the domain of its bin. Any other peak greater than 80%, with respect to the highest peak, will be considered as a key point with separate orientations.

Key point descriptor

The last step in the SIFT algorithm is to make a descriptor. The surrounding pixels to the key points are used to make descriptors. Hence, the descriptors are invariant to viewpoint and illumination to a certain extent. A 16 x16 grid is formed around the key point, further sliced into 4 x 4 sub-blocks.

A histogram of orientation and magnitude is created for each sub-block; therefore, 128 bin values are generated to represent a feature vector.

The feature vector’s rotation and illumination dependence have to be eliminated to get an accurate descriptor. Rotation dependence is eliminated by taking the difference between each gradient orientation and the key point’s orientation. Similarly, illumination dependency is eliminated using a threshold value and normalizing the descriptor vector.

Key point matching

The key points extracted from the previous steps are used for pattern matching in other images. This signifies the importance of SIFT in object detection and image matching.

Implementation of SIFT

The following is the implementation of SIFT using an in-built Python library, cv2. The library provides us with a method to create a sift object as done on line 7. The test image is first read and then converted to a grayscale image, so this way we reduce the dimensions from 3 (RGB) to 1 (line 3 and line 4). The sift object created is then used to detect and compute the key points & descriptors of the test image (line 8). Then we use another method, namely cv2.drawKeypoints, to highlight the key points on the image and superimpose them on the original image (line 9). After the SIFT algorithm is executed, a test image with all the highlighted key points is obtained.

Advantages	Explanation
Locality	The features are local, so they are not affected by noise and clutter.
Distinctiveness	The features that are obtained can be compared with a large datasets of objects.
Quantity	SIFT can help generate many features even from small objects.
Efficiency	The performance of this algorithm is comparable to real-time performance.

What is SIFT?

Overview

Key terminologies

Advantages of SIFT

Steps of execution

Building the scale-space

Difference of Gaussian (DoG)

Key point localization

Orientation assignment

Key point descriptor

Key point matching

Implementation of SIFT

Conclusion