Search⌘ K
AI Features

Maximizing Efficiency with Complete Statistics

Understand the concept of complete statistics and how they relate to minimal sufficient statistics and maximum likelihood estimators. Explore properties like unbiasedness and minimum variance that make certain statistics ideal for pooling in convolutional neural networks. Learn about ancillary statistics and their complementary role in improving model efficiency and stability.

Complete statistics

The many choices with minimal sufficient statistics sometimes confuse a selection. This section introduces complete statistics, which narrows the pooling statistic choice to only the maximum likelihood estimator (MLE) of the feature map distribution.

A complete statistic is a bridge between minimal sufficient statistics and MLEsMaximum likelihood estimators. MLEs derived from complete minimal statistics have the essential attributes of unbiasedness and minimum variance along with the minimality and completeness properties. MLEs, therefore, become the natural choice for pooling. This removes most of the ambiguity around pooling statistic selection.

Next, we lay out the attributes and path leading to the relationship between complete minimal statistics and the MLE.

Completeness

Let f(tθ)f(t|θ) be a family of pdfs or pmfs for a statistic T(X)T(X). The family of probability distributions is called complete if for every measurable, real-valued function g,Eθ(g(T))=0g, E_{\theta}(g(T)) = 0 for all θΩ\theta ∈ Ω implies g(T)=0g(T) = 0 with respect to θ\theta, that is, Pθ(g(T)=0)=1P_{\theta}(g(T) = 0) = 1 for all θ\theta. The statistic TT is boundedly complete if gg is bounded.

In simple words, it means a probability distribution is complete if the probability of a statistic T(X)T(X) from an observed sample X=X1,,XnX = X_1,\ldots, X_n in the distribution is always non-zero.

It becomes clearer by considering a discrete case. In this case, completeness means Eθ(g(T))=g(T)Pθ(T=t)=0E_{\theta}(g(T)) = \sum g(T)P_{\theta}(T = t) = 0 implies g(T)=0g(T) = 0 because by definition Pθ(T=t)P_{\theta}(T = t) is non-zero.

For example, suppose X1,,XnX_1, \ldots , X_n is observed from a normal distribution N(μ,1)N(μ, 1), and there is a statistic T(X)=XiT(X) = \sum X_i. Then, the Pμ(T(X)=0)P_μ(T(X) = 0) is not equal to 00 for all μμ. Therefore, Eμ(g(T))=g(T)Pμ(T)=0E_μ(g(T)) = \int g(T)P_μ(T) = 0 ...