Overview

The PointNet architecture is a foundational model for processing point cloud data. Although it was invented in 2017, its implementation is powerful, efficient, and possesses many desirable properties for working with point cloud data compared to other techniques like voxel grids or 2D image projection. The PointNet design provides a generic framework that supports classification, 3D object detection, point normals prediction, parts segmentation, semantic scene segmentation, and more. First, we introduce the PointNet architecture, followed by training an example implementation on a toy example.

Machine learning for point clouds

Generally speaking, we treat point clouds as a sequence of nn points (x,y,z)R3(x, y, z) \isin \mathbb{R}^3 in a Euclidean space. The coordinates are treated as features, but they can optionally include any number of arbitrary features, such as normals, colors, density, etc.

Working with point cloud data in machine learning can be challenging for several reasons. For one, the points in a point cloud are often arranged in an arbitrary order, so our models should ideally disregard the order of inputs. In practice, this isn’t the case with convolutional or recurrent architectures.

PointNet sought to address many of these issues via subtleties in its design. The design of PointNet attempts to provide the following desirable properties:

  • Permutation invariance: Points in a point cloud have no meaningful order. Therefore, a model should be invariant to point order. PointNet is invariant to N!N! permutations of input points.

  • Transformation invariance: Transformations like rotation and translation should have no effect on the underlying semantics of the object. An apple is still an apple, even if it is flipped upside down or moved to the side. PointNet attempts to make predictions that are robust to transformations.

  • Point proximity: Points that are close in 3D space are more closely related than points that are far away. PointNet captures local structures so that tasks like segmentation can leverage local point relationships.

PointNet architecture

The PointNet architecture is a simple and extensible application of building blocks that come out of the box in PyTorch. Through some clever design, the model achieves several desirable properties for point cloud processing. The following components are key innovations of the PointNet architecture:

  • Symmetric functions for permutation invariance

  • Joint alignment networks for transformation invariance

  • Local and global information aggregation

Symmetric functions for permutation invariance

To achieve permutation invariance while maintaining an efficient and stable solution, PointNet uses a symmetric function, any function that is invariant to the input order, to pool a collection of points. Such a symmetric function produces the same results regardless of the ordering of the inputs. For example, for a symmetric function ff with inputs x1x_1 and x2x_2:

Get hands-on with 1200+ tech skills courses.