Implementing the Tomasi-Kanade Factorization algorithm in Python

Key takeaways:

The Tomasi-Kanade Factorization algorithm helps solve the rigid structure from motion (RSfM) problem, useful in 3D reconstruction and computer vision.
It takes 2D point correspondences from multiple images and recovers both the 3D coordinates and camera motion.
In the first step, 3D data (a sphere) is visualized using a scatter plot.
Next, 2D points are generated from the 3D points using intrinsic and extrinsic camera matrices.
Singular Value Decomposition (SVD) is applied to decompose the matrix of 2D points to extract 3D structure components.
The reconstructed 3D structure is evaluated by comparing it to the original data using mean Euclidean error.
The algorithm can be used in robotics, augmented reality, and 3D modeling, though additional constraints may be necessary for complex real-world applications.

The Tomasi-Kanade Factorization algorithm is a widely used method in computer vision and 3D reconstruction for solving the problem of the rigid structure from motion (RSFM)It is the process of estimating the 3D structure of an object or scene from a sequence of 2D images taken from different viewpoints, assuming the object's structure is rigid (i.e doesn't deform over time). When we are given a set of 2D point correspondences in multiple images taken from different perspectives, RSfM aims to recover both the 3D coordinates of the points and the camera motion between these viewpoints. This algorithm has various applications, including robotics, augmented reality, and 3D modeling.

Step-by-step procedure

Here, we will focus on implementing the Tomasi-Kanade Factorization algorithm using our Python programming knowledge. The steps to implement this are outlined below.

Step 1: Visualizing the 3D data

To start off, we load our 3D data, which is present in the form of a sphere in a .mat file. This data is then displayed with the help of a scatter plot, which shows the sphere.

import numpy as np
import math
#defining the K matrix with parameters fx=fy=1, cx=cy=0 and skew factor 0
#Here fx,fy are focal lengths of the camera along the x and y axes and cx,cy are the principal point's coordinates
intrinsic = np.array([[1, 0, 0],[0, 1, 0]])
#defining a rotation matrix along the y-dimension
def rotation_matrix_y(angle):
    rot_angle=np.radians(angle)
    R = np.array([[math.cos(rot_angle),0,math.sin(rot_angle)],[0,1,0],[-math.sin(rot_angle),0,math.cos(rot_angle)]])
    return R
#here we are taking 4 2D images oriented at different angles (0, 10, 20 etc.)
R=rotation_matrix_y(0)
R1=rotation_matrix_y(10)
R2=rotation_matrix_y(20)
R3=rotation_matrix_y(30)
#mutliplying the intrinsic matrix with each of the 2D images
R=np.dot(intrinsic,R) 
R1=np.dot(intrinsic,R1)
R2=np.dot(intrinsic,R2)
R3=np.dot(intrinsic,R3)
#creating a 3Dpoints matrix vertically stacking a,b,c on top of each other
threeDpoints=np.vstack([a,b,c])
#creating a trans matrix which is the product of K and R with dimension 2F*3 where F=4 by vertically stacking R, R1, R2, R3 on top of each other
trans=np.vstack([R,R1,R2,R3])
print(np.shape(trans))
#computing new 2D points for the spherical structure
new2Dpoints=np.dot(trans,threeDpoints)
#Pair of x,y dimension of 1st 2D image
a1=new2Dpoints[0] 
b1=new2Dpoints[1]
#Pair of x,y dimensions of 2nd 2D image
c1=new2Dpoints[2]
d1=new2Dpoints[3]
#Pair of x,y dimension of 3rd 2D image
e1=new2Dpoints[4]
f1=new2Dpoints[5]
#Pair of x,y dimension of 4th 2D image
g1=new2Dpoints[6]
h1=new2Dpoints[7]
fig = plt.figure(figsize=(15,25))
ax = fig.add_subplot(411)
#Plotting 2D image with zero degree rotation
ax.set_title('2D image with zero degree rotation')
ax.scatter(a1,b1, marker='*')  
ax1=fig.add_subplot(412)
#Plotting 2D image with ten degree rotation
ax1.set_title('2D image with ten degree rotation')
ax1.scatter(c1,d1,marker='*')  
ax2=fig.add_subplot(413)
#Plotting 2D image with twenty degree rotation
ax2.set_title('2D image with twenty degree rotation')
ax2.scatter(e1,f1,marker='*')
ax3=fig.add_subplot(414)
#Plotting 2D image with thirty degree rotation
ax3.set_title('2D image with thirty degree rotation')
ax3.scatter(g1,h1,marker='*')

#initializing the W matrix
W=new2Dpoints
#applying SVD on the W matrix
u, s, v = np.linalg.svd(np.float32(W))
#constructing a diagonal matrix from the singular values found above
sig=np.diag(s)
# Extracting the first 3 columns of the left singular vectors matrix (u)
Rhatf=u[:,0:3]
# Extracting the first 3x3 submatrix from the diagonal matrix (sig)
s=sig[0:3,0:3]
# Extracting the first 3 rows of the right singular vectors matrix (v)
shat=v[0:3:,]
#number of augmented 2Dimages we took in the previous steps
nimages=4  
#Initializing the A matrix with shape 3*numofimages*6 where num of images is 4
A=np.zeros((3*nimages,6))  
#Loop through the number of images to apply orthogonality constraints
for i in range(nimages):  
  a=2*i
  b=2*i+1
  r1=Rhatf[a,0]
  r2=Rhatf[a,1]
  r3=Rhatf[a,2]
  r4=Rhatf[b,0]
  r5=Rhatf[b,1]
  r6=Rhatf[b,2]
  # Fill the rows of the A matrix with specific values based on orthogonality constraints
  A[a:b+2,:]=np.float32([[r1*r1,2*r1*r2,2*r1*r3,r2*r2,2*r2*r3,r3*r3],
            [r1*r4,(r2*r4)+(r1*r5),(r3*r4)+(r1*r6),r2*r5,(r3*r5)+(r2*r6),r3*r6],
            [r4*r4,2*r4*r5,2*r4*r6,r5*r5,2*r5*r6,r6*r6]])
#multiplying the b vector with n images which will correspond to every 3 rows of A
b=np.array(nimages*[1,0,1])  
#Calculate the pseudo-inverse of the A matrix
Ahat=np.linalg.pinv(A)
#Solving for x using the pseudo-inverse and the transpose of the b vector
x=np.dot(Ahat,np.transpose(b))
#Initializing an empty 3x3 matrix QQT
QQT = np.zeros((3, 3))
#assigning values of QQT matrix with values from the x vector
QQT[0, 0] = x[0]  
QQT[1, 1] = x[3]
QQT[2, 2] = x[5]
QQT[0, 1] = x[1]
QQT[1, 0] = x[1]
QQT[0, 2] = x[2]
QQT[2, 0] = x[2]
QQT[2, 1] = x[4]
QQT[1, 2] = x[4]
# Performing Cholesky decomposition on the QQT matrix
Q=np.linalg.cholesky(QQT)
# Calculating the inverse of the Q matrix to transform shat into the reconstructed structure
Qinv=np.linalg.inv(Q)
# Calculating the reconstructed 3D points by multiplying Qinv and shat
S=np.dot(Qinv,shat)

After decomposing the matrix W, we apply the orthogonality constraintsThese are used to impose mathematical conditions that ensure vectors (in this case within 'A' ) are perpendicular to each other. when populating the matrix A with components from the matrix Rhat. This matrix corresponds to the dominant directions of data vectors, which will help us estimate the reconstructed data’s rotation matrix while reducing dimensionality.
After this, we take the pseudo-inverse of this matrix A because it is initially an over-determined system (a system of equations having more equations than unknown variables).
It is multiplied with the matrix b to get the matrix QQT that will help us recover the covariance matrix of the 3D points, which will be used in estimating the uncertainty of the reconstructed 3D structure in the step after this.
Finally, the Cholesky decompositionThis decompositon method is used for for decomposing a symmetric, positive-definite matrix (in this case QQT) into the product of a lower triangular matrix and the transpose of the lower triangular matrix. is performed on QQT to obtain the lower-triangular matrix Q. This Q matrix is multiplied with the matrix shat to get the matrix S, containing the reconstructed 3D points of the sphere.

Step 4: Evaluating the reconstructed 3D structure

Finally, we evaluate the results of implementing this algorithm by first plotting the reconstructed 3D sphere, as shown below:

In addition, we compute the mean Euclidean error between the points of the initial 3D model and the one we reconstructed in the previous step as a benchmark to see how accurate our reconstruction was. The formula to do this in the case of 3D points is as follows:

$\sqrt{\sum_{i=1}^n \frac{(\hat{x_i}-x_i)^2+ (\hat{y_i}-y_i)^2 + (\hat{z_i}-z_i)^2}{n}}$

Where $\hat{x_i}$ , $\hat{y_i}$ , and $\hat{z_i}$ represent the dimensions of the $i_{th}$ reconstructed point and $x_i$ , $y_i$ , and $z_i$ are the dimensions of the ith original point.

Conclusion

In summary, the Tomasi-Kanade Factorization algorithm is a classic approach for solving the rigid structure from motion problem by factorizing a matrix of 2D point correspondences into a 3D structure and camera motion components. While it proves to be a solid foundation in computer vision, it might require additional constraints or information to produce accurate reconstructions in complex, real-world scenarios.

If you want to master Python algorithms, check out our course: Mastering Algorithms for Problem Solving in Python.

Frequently asked questions

Haven’t found what you were looking for? Contact Us

What are the steps involved in implementing the Tomasi-Kanade Factorization algorithm in Python?

The main steps include:

Visualizing 3D data and generating corresponding 2D points.
Applying intrinsic and extrinsic matrices to convert 3D points into 2D projections.
Reconstructing the 3D structure from the 2D points using Singular Value Decomposition (SVD).
Evaluating the reconstructed 3D structure and computing the mean Euclidean error to measure accuracy.

What is the purpose of Singular Value Decomposition (SVD) in this algorithm?

SVD is used to decompose the matrix of 2D points into three components (U, S, and V), helping extract the structure of the 3D model and camera motion. The first three columns of the matrix U represent the dominant directions of data vectors, which are critical in estimating the camera motion and 3D structure.

How is the accuracy of the reconstructed 3D model evaluated?

The accuracy of the reconstructed 3D model is evaluated by calculating the mean Euclidean error between the original and reconstructed 3D points. This error provides a benchmark to measure how closely the reconstructed model matches the original data.

What are the applications of the Tomasi-Kanade Factorization algorithm?

The algorithm has various applications, including:

Robotics or creating 3D maps and motion tracking.
Augmented Reality for reconstructing real-world objects from multiple viewpoints.
3D Modeling for generating accurate 3D models from images.