Search⌘ K
AI Features

Voxels

Explore the fundamentals of voxel grids as a 3D data representation and their role in machine learning. Understand how voxels extend 2D pixels into three dimensions, enabling applications like 3D shape occupancy modeling, object detection, and mesh generation. This lesson guides you through creating voxel grids, using PyTorch3D’s Volumes class, and rendering with ray marching, helping you manage the benefits and limitations of voxel-based data.

The voxel grid

The first data representation we introduce is the most straightforward one: the voxel grid. Essentially, the voxel grid approach is very similar to the dense 2D grid structure we see in images. The only difference is the extension into a 3rd spatial dimension. They are also called volumes, and PyTorch3D has its own Volumes class.

Rather than representing visual data (e.g., light) projected onto a 2D plane, voxel grids are a discrete data structure directly representing the 3D physical space. Instead of 2D pixels, which bin light within a rectangular receptive field, we have voxels, which are the 3D analog of pixels. Images can represent more than just color; images can represent depth from time-of-flight sensors or LiDARLiDAR, or Light Detection and Ranging, is a remote sensing technology that uses laser light to measure depth or distances with high precision. , radiation measurements from PET, SPECT, and CT scanning, and more. Likewise, voxels in a 3D space can represent density, probability, and color, just to name a few.

Example of a voxel grid
Example of a voxel grid

This simple grid design allows us to carry over most of the same tools we use in deep neural networks for computer vision, such as convolutions, pooling, and activation functions. In many cases, other types of 3D data such as meshes and point clouds can be easily massaged into the voxel grid format as well.

Use cases

Voxel grids can be useful in applications where dense representation of data is needed.

Modeling occupancy

In the case of 3D shape data, the simplest technique involving voxel grids is to model the binary occupancy of the 3D space. In other words, every voxel contains either a 00 if geometry does not exist or a 11 if geometry does exist at that location in 3D space. This can be extended to a Bernoulli distributionThe Bernoulli distribution models a random experiment with two possible outcomes (usually success and failure) and is characterized by a single parameter representing the probability of success. by allowing each voxel to contain a probability pp in the range ...