Rendering is the process of taking a 3D scene and producing a 2D image via an algorithm analogous to taking a virtual photo in a virtual world. It is the final stage in the computer graphics pipeline, applying some algorithm to all relevant information from a 3D scene to create an image.

The science of rendering goes back many decades and across many generations of computing, even as far back as the 1960s. Likewise, researchers and engineers have devised a rich ecosystem of techniques, both theoretical and practical, to account for varying levels of computing capability and desired fidelity. Some of these techniques include ray tracing, path tracing, scanline rendering and rasterization, neural rendering, and more.

Research into the techniques of differentiable rendering is accelerating as deep learning and the metaverse become more attractive. Differentiable rendering promises to use the power of machine learning to grant us insights into 3D worlds through the lens of our 2D renders. To see why differentiable rendering is unique and necessary, we first visit traditional rendering.

Traditional rendering

As mentioned before, the world of 3D rendering is incredibly vast, so providing a comprehensive overview of the many techniques would be impractical. However, we’ll introduce some of the most basic components of the process and what makes them traditionally non-differentiable to motivate our discussion of differentiable rendering. This overview will focus on the techniques employed by PyTorch3D: rasterization, z-buffering, and shading.


We have our 3D scene, and we have a virtual camera. The first step toward creating an image from this camera is called rasterization: the process of taking vectorized information, such as meshes in a 3D scene, and projecting them onto what’s called a raster image—a 2D grid where its cells correspond to pixels.

For now, we consider 3D models that are composed of polygons, small planes in 3D space defined by three or more connected points. The polygons are like molecules in that they are the smallest components that describe the structure of a 3D model. These polygons are connected to define the surface topology or where the exterior surface of the object is in 3D space. The geometry of such a 3D model is the overall structure described by its polygons. For now, we also use the term faces to refer to polygons.

Das raster is the German phrase for screen or grid and conveys some of the intuition of how this process works. Imagine peering through a single cell, or pixel, of this grid. The process of rasterization traces the scene to identify the relevant geometry and scene contents to inform the rendering of that pixel. It consists primarily of the following steps:

  1. It projects the vertices of relevant geometry into the camera view (i.e., into camera screen coordinates).

  2. For each pixel, it identifies the geometry that projects into that pixel.

  3. It renders the geometry accordingly.

Rasterization is essentially projecting surfaces from the scene into that pixel to determine which scene components will be needed to produce a value at that pixel. This could be a single polygon of geometry, or it could be many polygons, possibly overlapping one another. It could even be pure background content. Different shading approachesThe techniques used to convert geometry information into color. will dictate how to convert this information into pixel values.

In other words, rasterization is said to be a solution to the visibility problem, which is to say: What parts of a scene are visible to a given pixel? A practical way to solve this problem is to project the vertices of our scene object(s) into image space. We can then rely on the object geometry to tell us if that portion of the object is visible to a given pixel. This process of rasterization is essentially solving which portions of our geometry are inside the frustum of each pixel.

Get hands-on with 1200+ tech skills courses.