Implementing the Tomasi-Kanade Factorization algorithm in Python

Key takeaways:

  • The Tomasi-Kanade Factorization algorithm helps solve the rigid structure from motion (RSfM) problem, useful in 3D reconstruction and computer vision.

  • It takes 2D point correspondences from multiple images and recovers both the 3D coordinates and camera motion.

  • In the first step, 3D data (a sphere) is visualized using a scatter plot.

  • Next, 2D points are generated from the 3D points using intrinsic and extrinsic camera matrices.

  • Singular Value Decomposition (SVD) is applied to decompose the matrix of 2D points to extract 3D structure components.

  • The reconstructed 3D structure is evaluated by comparing it to the original data using mean Euclidean error.

  • The algorithm can be used in robotics, augmented reality, and 3D modeling, though additional constraints may be necessary for complex real-world applications.

The Tomasi-Kanade Factorization algorithm is a widely used method in computer vision and 3D reconstruction for solving the problem of the rigid structure from motion (RSFM)It is the process of estimating the 3D structure of an object or scene from a sequence of 2D images taken from different viewpoints, assuming the object's structure is rigid (i.e doesn't deform over time). When we are given a set of 2D point correspondences in multiple images taken from different perspectives, RSfM aims to recover both the 3D coordinates of the points and the camera motion between these viewpoints. This algorithm has various applications, including robotics, augmented reality, and 3D modeling.

Step-by-step procedure

Here, we will focus on implementing the Tomasi-Kanade Factorization algorithm using our Python programming knowledge. The steps to implement this are outlined below.

Step 1: Visualizing the 3D data

To start off, we load our 3D data, which is present in the form of a sphere in a .mat file. This data is then displayed with the help of a scatter plot, which shows the sphere.

#importing necessary libraries for loading the .mat file
from scipy.io import loadmat
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
#loading the .mat file
x = loadmat('/teapot_data.mat')
#separating x,y,z dimension points for scatter plot
a = x['x']
b = x['y']
c = x['z']
#plotting the 3D sphere
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(a,b,c, c='b', marker='*')

Step 2: Generating the corresponding 2D points

Because we do not want to introduce the complexity of extracting feature points from images and tracking them over multiple frames, we generate 2D points corresponding to the 3D structure we visualized in the previous step. The formula to implement in this step is given below:

              new2Dpoints=K[RT]3Dpointsnew2Dpoints = K[R|T]*3Dpoints

Where,

  • KK is the intrinsic matrixThis matrix contains information about focal length, principal point and other calibration parameters.
  • [RT][R|T] is the extrinsic matrix.This matrix represents the frame of a camera in the world coordinate system. It consists of the rotation matrix (denoted by R) and the translation matrix (denoted by T)

Here is a pictorial representation of this process for better understanding.

Converting 3D points to 2D points
Converting 3D points to 2D points

For the sake of simplicity, we take four 2D images (or four camera frames) to be reconstructed in the later steps. We first multiply these with the intrinsic matrix and then the 3D points matrix to generate the new 2D points. These 2D points are displayed with individual 2D scatter plots, as shown below:

import numpy as np
import math
#defining the K matrix with parameters fx=fy=1, cx=cy=0 and skew factor 0
#Here fx,fy are focal lengths of the camera along the x and y axes and cx,cy are the principal point's coordinates
intrinsic = np.array([[1, 0, 0],[0, 1, 0]])
#defining a rotation matrix along the y-dimension
def rotation_matrix_y(angle):
rot_angle=np.radians(angle)
R = np.array([[math.cos(rot_angle),0,math.sin(rot_angle)],[0,1,0],[-math.sin(rot_angle),0,math.cos(rot_angle)]])
return R
#here we are taking 4 2D images oriented at different angles (0, 10, 20 etc.)
R=rotation_matrix_y(0)
R1=rotation_matrix_y(10)
R2=rotation_matrix_y(20)
R3=rotation_matrix_y(30)
#mutliplying the intrinsic matrix with each of the 2D images
R=np.dot(intrinsic,R)
R1=np.dot(intrinsic,R1)
R2=np.dot(intrinsic,R2)
R3=np.dot(intrinsic,R3)
#creating a 3Dpoints matrix vertically stacking a,b,c on top of each other
threeDpoints=np.vstack([a,b,c])
#creating a trans matrix which is the product of K and R with dimension 2F*3 where F=4 by vertically stacking R, R1, R2, R3 on top of each other
trans=np.vstack([R,R1,R2,R3])
print(np.shape(trans))
#computing new 2D points for the spherical structure
new2Dpoints=np.dot(trans,threeDpoints)
#Pair of x,y dimension of 1st 2D image
a1=new2Dpoints[0]
b1=new2Dpoints[1]
#Pair of x,y dimensions of 2nd 2D image
c1=new2Dpoints[2]
d1=new2Dpoints[3]
#Pair of x,y dimension of 3rd 2D image
e1=new2Dpoints[4]
f1=new2Dpoints[5]
#Pair of x,y dimension of 4th 2D image
g1=new2Dpoints[6]
h1=new2Dpoints[7]
fig = plt.figure(figsize=(15,25))
ax = fig.add_subplot(411)
#Plotting 2D image with zero degree rotation
ax.set_title('2D image with zero degree rotation')
ax.scatter(a1,b1, marker='*')
ax1=fig.add_subplot(412)
#Plotting 2D image with ten degree rotation
ax1.set_title('2D image with ten degree rotation')
ax1.scatter(c1,d1,marker='*')
ax2=fig.add_subplot(413)
#Plotting 2D image with twenty degree rotation
ax2.set_title('2D image with twenty degree rotation')
ax2.scatter(e1,f1,marker='*')
ax3=fig.add_subplot(414)
#Plotting 2D image with thirty degree rotation
ax3.set_title('2D image with thirty degree rotation')
ax3.scatter(g1,h1,marker='*')

Note: We take the rotation matrix for the y-dimension so that the 2D images are rotated along the y-axis. But of course, depending on our problem, we can choose the rotation matrix along any axis that we desire.

Step 3: Reconstructing the 3D structure from the 2D points

The new 2D points generated in the previous step comprise the part of the measurement matrixThe matrix where the columns represent the 2D image point measurements across different frames. (denoted as W), which will be decomposed using the singular value decompositionA matrix factorization method that generalizes the decomposition of a square matrix (order n x n) to any matrix (order n x m) technique as follows:

#initializing the W matrix
W=new2Dpoints
#applying SVD on the W matrix
u, s, v = np.linalg.svd(np.float32(W))
#constructing a diagonal matrix from the singular values found above
sig=np.diag(s)
# Extracting the first 3 columns of the left singular vectors matrix (u)
Rhatf=u[:,0:3]
# Extracting the first 3x3 submatrix from the diagonal matrix (sig)
s=sig[0:3,0:3]
# Extracting the first 3 rows of the right singular vectors matrix (v)
shat=v[0:3:,]
#number of augmented 2Dimages we took in the previous steps
nimages=4
#Initializing the A matrix with shape 3*numofimages*6 where num of images is 4
A=np.zeros((3*nimages,6))
#Loop through the number of images to apply orthogonality constraints
for i in range(nimages):
a=2*i
b=2*i+1
r1=Rhatf[a,0]
r2=Rhatf[a,1]
r3=Rhatf[a,2]
r4=Rhatf[b,0]
r5=Rhatf[b,1]
r6=Rhatf[b,2]
# Fill the rows of the A matrix with specific values based on orthogonality constraints
A[a:b+2,:]=np.float32([[r1*r1,2*r1*r2,2*r1*r3,r2*r2,2*r2*r3,r3*r3],
[r1*r4,(r2*r4)+(r1*r5),(r3*r4)+(r1*r6),r2*r5,(r3*r5)+(r2*r6),r3*r6],
[r4*r4,2*r4*r5,2*r4*r6,r5*r5,2*r5*r6,r6*r6]])
#multiplying the b vector with n images which will correspond to every 3 rows of A
b=np.array(nimages*[1,0,1])
#Calculate the pseudo-inverse of the A matrix
Ahat=np.linalg.pinv(A)
#Solving for x using the pseudo-inverse and the transpose of the b vector
x=np.dot(Ahat,np.transpose(b))
#Initializing an empty 3x3 matrix QQT
QQT = np.zeros((3, 3))
#assigning values of QQT matrix with values from the x vector
QQT[0, 0] = x[0]
QQT[1, 1] = x[3]
QQT[2, 2] = x[5]
QQT[0, 1] = x[1]
QQT[1, 0] = x[1]
QQT[0, 2] = x[2]
QQT[2, 0] = x[2]
QQT[2, 1] = x[4]
QQT[1, 2] = x[4]
# Performing Cholesky decomposition on the QQT matrix
Q=np.linalg.cholesky(QQT)
# Calculating the inverse of the Q matrix to transform shat into the reconstructed structure
Qinv=np.linalg.inv(Q)
# Calculating the reconstructed 3D points by multiplying Qinv and shat
S=np.dot(Qinv,shat)
  • After decomposing the matrix W, we apply the orthogonality constraintsThese are used to impose mathematical conditions that ensure vectors (in this case within 'A' ) are perpendicular to each other. when populating the matrix A with components from the matrix Rhat. This matrix corresponds to the dominant directions of data vectors, which will help us estimate the reconstructed data’s rotation matrix while reducing dimensionality.

  • After this, we take the pseudo-inverse of this matrix A because it is initially an over-determined system (a system of equations having more equations than unknown variables).

  • It is multiplied with the matrix b to get the matrix QQT that will help us recover the covariance matrix of the 3D points, which will be used in estimating the uncertainty of the reconstructed 3D structure in the step after this.

  • Finally, the Cholesky decompositionThis decompositon method is used for for decomposing a symmetric, positive-definite matrix (in this case QQT) into the product of a lower triangular matrix and the transpose of the lower triangular matrix. is performed on QQT to obtain the lower-triangular matrix Q. This Q matrix is multiplied with the matrix shat to get the matrix S, containing the reconstructed 3D points of the sphere.

Step 4: Evaluating the reconstructed 3D structure

Finally, we evaluate the results of implementing this algorithm by first plotting the reconstructed 3D sphere, as shown below:

#Extracting x-dimension of the reconstructed 3D points
a1 = S[0,:]
#Extracting y-dimension of the reconstructed 3D points
b1 = S[1,:]
#Extracting z-dimension of the reconstructed 3D points
c1 = S[2,:]
#Plotting reconsructed 3D model with a 3D scatter plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter3D(a1,b1,c1,color='green', marker='*')

In addition, we compute the mean Euclidean error between the points of the initial 3D model and the one we reconstructed in the previous step as a benchmark to see how accurate our reconstruction was. The formula to do this in the case of 3D points is as follows:

              i=1n(xi^xi)2+(yi^yi)2+(zi^zi)2n\sqrt{\sum_{i=1}^n \frac{(\hat{x_i}-x_i)^2+ (\hat{y_i}-y_i)^2 + (\hat{z_i}-z_i)^2}{n}}

Where xi^\hat{x_i}, yi^\hat{y_i}, and zi^\hat{z_i} represent the dimensions of the ithi_{th} reconstructed point and xix_i,yiy_i, and ziz_i are the dimensions of the ith original point.

#A,B,C vectors correspond to the original 3D model x,y,z dimensional vectors here
err = 0.0
j = 0
for i in range(len(a1)):
err = err + math.sqrt(((a1[i]-A[0][i])*(a1[i]-A[0][i])) + ((b1[i]-B[0][i])*(b1[i]-B[0][i])) + ((c1[i]-C[0][i])*(c1[i]-C[0][i])))
j=j+1
print ("The mean euclidean error is: ",(err/j))

Conclusion

In summary, the Tomasi-Kanade Factorization algorithm is a classic approach for solving the rigid structure from motion problem by factorizing a matrix of 2D point correspondences into a 3D structure and camera motion components. While it proves to be a solid foundation in computer vision, it might require additional constraints or information to produce accurate reconstructions in complex, real-world scenarios.

If you want to master Python algorithms, check out our course: Mastering Algorithms for Problem Solving in Python.

Frequently asked questions

Haven’t found what you were looking for? Contact Us


What are the steps involved in implementing the Tomasi-Kanade Factorization algorithm in Python?

The main steps include:

  • Visualizing 3D data and generating corresponding 2D points.
  • Applying intrinsic and extrinsic matrices to convert 3D points into 2D projections.
  • Reconstructing the 3D structure from the 2D points using Singular Value Decomposition (SVD).
  • Evaluating the reconstructed 3D structure and computing the mean Euclidean error to measure accuracy.

What is the purpose of Singular Value Decomposition (SVD) in this algorithm?

SVD is used to decompose the matrix of 2D points into three components (U, S, and V), helping extract the structure of the 3D model and camera motion. The first three columns of the matrix U represent the dominant directions of data vectors, which are critical in estimating the camera motion and 3D structure.


How is the accuracy of the reconstructed 3D model evaluated?

The accuracy of the reconstructed 3D model is evaluated by calculating the mean Euclidean error between the original and reconstructed 3D points. This error provides a benchmark to measure how closely the reconstructed model matches the original data.


What are the applications of the Tomasi-Kanade Factorization algorithm?

The algorithm has various applications, including:

  • Robotics or creating 3D maps and motion tracking.
  • Augmented Reality for reconstructing real-world objects from multiple viewpoints.
  • 3D Modeling for generating accurate 3D models from images.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved