As a reference, we’ll provide a glossary of some common technical terms used in the course.
JAX
Some common JAX terms are presented here. These terms may or may not match with the common terminologies.

Asynchronous Dispatch: The phenomenon in which “jitted” function doesn’t await the complete calculation and passes it on to the next calculation.

Device: A generic term for either CPU, GPU or TPU.

DeviceArray : JAX’s analog of the
numpy.ndarray
. 
jaxpr
: JAX Expressions (or jaxpr) are the intermediate representations of a computation graph. 
JIT: Just In Time compilation. It is performed in JAX using XLA.

Pytrees: A treelike structure built out of containerlike Python objects.

static
: Compiletime computation in a JIT compilation and hence not traced. 
TPU: Tensor Processing Unit,

Tracer: An object determining the sequence of operations performed by a Python function.

Transformation: A higherorder function that takes functions as inputs and returning a transformed function like
jit()
,grad()
,pmap()
and so on. 
VJP: Vector Jacobian Product is opposite of JVP and is used for reversemode auto differentiation.

XLA: Accelerated Linear Algebra, XLA is a domainspecific compiler for linear algebra. It is used for JAX JIT compilation.

Weak type: A JAX data type having the same type promotion semantics as Python scalars.
Theory

Adam: A highly used algorithm for stochastic optimization.

Autodifferentiation: A technique for calculating derivatives using the chain rule.

Batch Normalization: A technique enabling faster training of deep neural networks by rescaling (and centering) them to have zero mean and a variance of one. $\mathcal N(0,1)$.

Convolution: A mathematical operation expressing how the shape of a signal $f$ is modified by another signal $g$.

Cumulative Distribution Function (CDF): The value a distribution will take for values less than or equal to a given value $x$.

Gaussian (Normal) distribution: The famous bellshaped curve probability distribution used to model a lot of realworld situations.

Gradient Clipping: A technique for faster convergence of deep neural networks by dividing the gradients beyond a threshold by its norm.

Hessian: The matrix of 2nd order derivatives for a given vector.

Jacobian: Firstorder derivatives matrix for a vectorvalued function.

JVP: Jacobian Vector Product is used to implement forwardmode auto differentiation.

KullbackLeibler (KL) Divergence: A commonlyused, asymmetric divergence measure.

Poisson distribution: A discrete probability distribution, used to model quite unlikely events.

PRNG: PseudoRandom Number Generator is a key component of JAX and other numerical computation libraries as well.

Probability Density Function (PDF): Derivative of CDF at a given point. Its value can be intuitively treated as a measure of probability around that point.

Probability Mass Function (PMF): Discrete counterpart of PDF.

Transposed Convolution: The opposite of convolution resulting in upsampled output.

Wasserstein GAN (WGAN): A type of GAN using Wasserstein loss.

Wasserstein Loss: A loss function based on the optimal transport problem (hence also known as earthmover distance).
Create a free account to view this lesson.
By signing up, you agree to Educative's Terms of Service and Privacy Policy