Efficient Pooling Strategies
Explore the role of pooling in convolutional neural networks by understanding how efficient summary statistics, such as sufficient and minimal sufficient statistics, preserve key information during feature extraction. This lesson explains the theory behind pooling, including common methods like max-pooling and average-pooling, and guides you on choosing or combining the best statistics to enhance CNN performance for rare event prediction.
The strength of a convolutional network is its ability to simplify the feature extraction process. In this, pooling plays a critical role by removing the extraneous information. A pooling operation summarizes features into a summary statistic. It, therefore, relies on the statistic’s efficiency. Whether the statistic preserves the relevant information or loses it depends on its efficiency.
What’s an efficient summary statistic?
A summary statistic is a construct from principles of data reduction. It summarizes a set of observations to preserve the largest amount of information as succinctly as possible.
Therefore, an efficient summary statistic is one that concisely contains the most information about a sample, such as the sample mean or maximum. Other statistics like the sample skewness or sample size, do not contain as much relevant information and, therefore, are not efficient for pooling. This lesson lays out the theory of summary statistics to learn about efficient statistics for pooling.
“An experimenter might wish to summarize the information in a sample by determining a few key features of the sample values. This is usually done by computing (summary) statistics—functions of the sample.” (Casella and Berger 2002)
Learning the dependence of pooling on the efficiency of summary statistics and the theory behind them is rewarding. It provides answers to questions like:
-
Currently, max-pool and average-pool are the most common. Could there be other equally or more effective pooling statistics?
-
Max-pool is found to be robust and, therefore, better than others in most problems. What is the cause of max pooling’s robustness?
-
Can more than one pooling statistic be used together? If yes, how to find the best combination of ...