Homographies

Learn to register images of planar objects with affine and perspective transformations.

Locating objects in an image is a task we constantly have to perform in automated inspection. For example, we might know where the features to inspect are in the object’s reference frame. However, the object of interest can appear anywhere in the image under different poses. Consider the pair of images below.

Press + to interact
A book viewed from two different angles
A book viewed from two different angles

Suppose we have a client who wants us to inspect the correct printing of the book title “WE IMAGINE WE DRAW OBJECTS” and the educational series, “BARRON’S”, at the bottom of the book. Unfortunately, the camera can capture images from different points of view, and these two images represent the most extreme cases. Our task is to register these images such that the features appear approximately at the same place, under the same pose.

Imagine a virtual camera looking perpendicularly to each planar surface, with the pixel axes aligned with the book edges. We search for a transformation that warps our original images into an image captured by this virtual camera. If we succeed, the features to inspect should always be more or less in the same image area, aligned with the image axes.

This transformation (a projection from the object plane to a virtual camera image plane) is called homography. OpenCV offers two homographic projection functions: cv2.warpAffine() and cv2.warpPerspective(). We’ll apply both transformations to our images to understand when we should use one or the other, but first, we should look at the geometry of image projection.

Projection matrix of a pinhole camera

A simple model for a camera is the pinhole camera model. Imagine that a camera is made of a light sensor area (the image plane) in a closed box, with only a very small hole (the pinhole) allowing rays of light to reach the image plane. The distance between the image plane and the pinhole is ff, the focal distance.

Note: In a real camera, a lens plays the role of the pinhole. The lens allows more light to reach the sensor, at the cost of a shorter depth of field.

Press + to interact
Geometry of the pinhole camera model
Geometry of the pinhole camera model

The pinhole camera projection matrix computes the pixel coordinates of the image of a point from the 3D coordinates of the point. The projection matrix is a 3x4 matrix that we can decompose by the following equation:

  • f/lxf/l_x is the ratio of the focal distance to the pixel dimension in xx.

  • f/lyf/l_y ...

Get hands-on with 1400+ tech skills courses.