The Sentence Classification CNN Model

Learn about the architecture of the CNN-based model for sentence classification.

Now, we’ll look at the technical details of the CNN used for sentence classification. First, we’ll discuss how data or sentences are transformed into a preferred format that can easily be dealt with by CNNs. Next, we’ll discuss how the convolution and pooling operations are adapted for sentence classification, and finally, we’ll discuss how all these components are connected.

The convolution operation

If we ignore the batch size, that is, if we assume that we are only processing a single sentence at a time, our data is a n×k n\times k matrix, where nn is the number of words per sentence after padding, and kk is the dimension of a single word vector. In our example, this would be 7×137 \times 13.

Now, we’ll define our convolution weight matrix to be of size m×km \times k, where mm is the filter size for a 1D convolution operation. By convolving the input xx of size n×kn \times k with a weight matrix WW of size m×km \times k, we’ll produce an output of hh of size 1×n1 \times n as follows:

Get hands-on with 1200+ tech skills courses.