Solving Jigsaw Puzzles

Learn to implement self-supervised learning via solving jigsaw puzzles.

Jigsaw puzzles

Similar to predicting the relative position of patches, this pretext task involves asking a neural network to solve jigsaw puzzles to develop a visuospatial representation of objects in the image. As shown in the figure below, the input image XiX_i is first split into a 3×33\times3 grid, and all nine patches, Xi=[Xip1,Xip2,...Xip9]X_i = [X_i^{p_1}, X_i^{p_2}, ... X_i^{p_9}], are shuffled based on a random permutation ,pp, selected from a set of predefined permutations, PP. This shuffled image permute(Xi, p)\text{permute}(X_i, \ p) is passed through the neural network f(.)f(.) to predict which permutation from set PP was applied to the image (i.e., f(permute(Xi, p))=pf(\text{permute}(X_i, \ p)) = p). Mathematically, the neural network performs aP\vert P \vert- way classification problem and minimizes the following loss function:

Get hands-on with 1200+ tech skills courses.