Some common terms used throughout the course

We'll cover the following

As a reference, we’ll provide a glossary of some common technical terms used in the course.


Some common JAX terms are presented here. These terms may or may not match with the common terminologies.

  • Asynchronous Dispatch: The phenomenon in which “jitted” function doesn’t await the complete calculation and passes it on to the next calculation.

  • Device: A generic term for either CPU, GPU or TPU.

  • DeviceArray : JAX’s analog of the numpy.ndarray.

  • jaxpr: JAX Expressions (or jaxpr) are the intermediate representations of a computation graph.

  • JIT: Just In Time compilation. It is performed in JAX using XLA.

  • Pytrees: A tree-like structure built out of container-like Python objects.

  • static: Compile-time computation in a JIT compilation and hence not traced.

  • TPU: Tensor Processing Unit,

  • Tracer: An object determining the sequence of operations performed by a Python function.

  • Transformation: A higher-order function that takes functions as inputs and returning a transformed function like jit(), grad(), pmap() and so on.

  • VJP: Vector Jacobian Product is opposite of JVP and is used for reverse-mode auto differentiation.

  • XLA: Accelerated Linear Algebra, XLA is a domain-specific compiler for linear algebra. It is used for JAX JIT compilation.

  • Weak type: A JAX data type having the same type promotion semantics as Python scalars.


  • Adam: A highly used algorithm for stochastic optimization.

  • Auto-differentiation: A technique for calculating derivatives using the chain rule.

  • Batch Normalization: A technique enabling faster training of deep neural networks by rescaling (and centering) them to have zero mean and a variance of one. N(0,1)\mathcal N(0,1).

  • Convolution: A mathematical operation expressing how the shape of a signal ff is modified by another signal gg.

  • Cumulative Distribution Function (CDF): The value a distribution will take for values less than or equal to a given value xx.

  • Gaussian (Normal) distribution: The famous bell-shaped curve probability distribution used to model a lot of real-world situations.

  • Gradient Clipping: A technique for faster convergence of deep neural networks by dividing the gradients beyond a threshold by its norm.

  • Hessian: The matrix of 2nd order derivatives for a given vector.

  • Jacobian: First-order derivatives matrix for a vector-valued function.

  • JVP: Jacobian Vector Product is used to implement forward-mode auto differentiation.

  • Kullback-Leibler (KL) Divergence: A commonly-used, asymmetric divergence measure.

  • Poisson distribution: A discrete probability distribution, used to model quite unlikely events.

  • PRNG: Pseudo-Random Number Generator is a key component of JAX and other numerical computation libraries as well.

  • Probability Density Function (PDF): Derivative of CDF at a given point. Its value can be intuitively treated as a measure of probability around that point.

  • Probability Mass Function (PMF): Discrete counterpart of PDF.

  • Transposed Convolution: The opposite of convolution resulting in upsampled output.

  • Wasserstein GAN (WGAN): A type of GAN using Wasserstein loss.

  • Wasserstein Loss: A loss function based on the optimal transport problem (hence also known as earth-mover distance).

Create a free account to view this lesson.

By signing up, you agree to Educative's Terms of Service and Privacy Policy