# Glossary

Some common terms used throughout the course

We'll cover the following

As a reference, we’ll provide a glossary of some common technical terms used in the course.

## JAX

Some common JAX terms are presented here. These terms may or may not match with the common terminologies.

• Asynchronous Dispatch: The phenomenon in which “jitted” function doesn’t await the complete calculation and passes it on to the next calculation.

• Device: A generic term for either CPU, GPU or TPU.

• DeviceArray : JAX’s analog of the numpy.ndarray.

• jaxpr: JAX Expressions (or jaxpr) are the intermediate representations of a computation graph.

• JIT: Just In Time compilation. It is performed in JAX using XLA.

• Pytrees: A tree-like structure built out of container-like Python objects.

• static: Compile-time computation in a JIT compilation and hence not traced.

• TPU: Tensor Processing Unit,

• Tracer: An object determining the sequence of operations performed by a Python function.

• Transformation: A higher-order function that takes functions as inputs and returning a transformed function like jit(), grad(), pmap() and so on.

• VJP: Vector Jacobian Product is opposite of JVP and is used for reverse-mode auto differentiation.

• XLA: Accelerated Linear Algebra, XLA is a domain-specific compiler for linear algebra. It is used for JAX JIT compilation.

• Weak type: A JAX data type having the same type promotion semantics as Python scalars.

## Theory

• Adam: A highly used algorithm for stochastic optimization.

• Auto-differentiation: A technique for calculating derivatives using the chain rule.

• Batch Normalization: A technique enabling faster training of deep neural networks by rescaling (and centering) them to have zero mean and a variance of one. $\mathcal N(0,1)$.

• Convolution: A mathematical operation expressing how the shape of a signal $f$ is modified by another signal $g$.

• Cumulative Distribution Function (CDF): The value a distribution will take for values less than or equal to a given value $x$.

• Gaussian (Normal) distribution: The famous bell-shaped curve probability distribution used to model a lot of real-world situations.

• Gradient Clipping: A technique for faster convergence of deep neural networks by dividing the gradients beyond a threshold by its norm.

• Hessian: The matrix of 2nd order derivatives for a given vector.

• Jacobian: First-order derivatives matrix for a vector-valued function.

• JVP: Jacobian Vector Product is used to implement forward-mode auto differentiation.

• Kullback-Leibler (KL) Divergence: A commonly-used, asymmetric divergence measure.

• Poisson distribution: A discrete probability distribution, used to model quite unlikely events.

• PRNG: Pseudo-Random Number Generator is a key component of JAX and other numerical computation libraries as well.

• Probability Density Function (PDF): Derivative of CDF at a given point. Its value can be intuitively treated as a measure of probability around that point.

• Probability Mass Function (PMF): Discrete counterpart of PDF.

• Transposed Convolution: The opposite of convolution resulting in upsampled output.

• Wasserstein GAN (WGAN): A type of GAN using Wasserstein loss.

• Wasserstein Loss: A loss function based on the optimal transport problem (hence also known as earth-mover distance).