Enhancing Autoencoder Models

Discover how to create well-posed autoencoders with orthonormal weights and sparse feature covariances.

Autoencoders have proven to be useful for unsupervised and semi-supervised learning. Earlier, in the autoencoder family, we presented a variety of ways to model autoencoders. Still, there’s significant room for new development.

This lesson is intended for researchers seeking new development.

Well-posed autoencoders

A mathematically well-posed autoencoder is easier to tune and optimize. Its structure can be defined from its relationship with principal component analysis (PCA).

A linearly activated autoencoder approximates PCA. Conversely, autoencoders are a nonlinear extension of PCA. In other words, an autoencoder extends PCA to a nonlinear space. Therefore, an autoencoder should ideally have the properties of PCA. These properties are:

  • Orthonormal weights: It should be defined as follows for encoder weights:

    WTW=I, andj=1pwij2=1,i=1,,k\begin{align*} &\left. W^TW =I,\space \text{and} \right. \\ &\left. \sum_{j=1}^p w_{ij}^2 = 1, i = 1,\ldots,k \right. \end{align*}

    where II is a p×pp×p identity matrix, pp is the number of input features, and kk is the number of nodes in an encoder layer.

  • Independent features: The principal component analysis yields independent features. This can be seen by computing the covariance of the principal scores Z=XWZ = XW,

    cov(Z)(XW)T(XW)=WTXTXWWTWΛWTW=Λ\begin{align*} cov(Z) ∝&\left.(XW )^T (XW )\right. \\ =&\left.W^TX^TXW\right. \\ ∝&\left.W^TWΛW^TW\right. \\ =&\left.Λ\right. \end{align*} ...