Project: Bag of Visual Words

The bag of words model has been used extensively in natural language processing to represent a document as a feature vector. Consider a language with a vocabulary VV of mm words. We can assume some ordering of the words (lexicographic) in VV, resulting in v1,v2,,vmv_1,v_2,\dots,v_m, where vkv_k is the kthk^{th} word in the vocabulary VV. To make a feature vector for any document, create a vector x\bold x of mm components, that is, xRm\bold x \in \R^m, where the kthk^{th} component of x\bold x represents the frequency of the word vkv_k in the document.

Note: The feature vectors of different-sized documents have the same number of components, and any reordering (permutation) of the words in a document keeps the feature vector unchanged.

The bag of visual words (BoVW) is a method of representing images as fixed-length feature vectors. To define the vocabulary in images, one common approach is to cluster cropped patches of fixed sizes from all the training set images usingkk-means clustering. The centroids of these clusters form the vocabulary.

canvasAnimation-image
1 / 4

To estimate frequencies and compare images, the most common approach is to use a similarity measure. This involves partitioning an image into adjacent patches of the same size used in the training process, computing the similarity of each patch with all the visual words, and accumulating the similarities in the respective cells of the feature vector. The closest visual word receives the highest similarity value. Finally, normalization is applied to the feature vector to reduce the impact of image size.

Feature extraction in BoVW
Feature extraction in BoVW

In this project, the BoVW method is used to represent images as fixed-length feature vectors. The goal is to classify images of cats and trucks. The project is divided into sub-tasks, including importing modules, loading datasets, visualizing images, cropping patches, creating a bag of visual words, preparing data for the classifier, and building a classifier.