...

/

Sentence Classification with CNNs

Sentence Classification with CNNs

Learn to implement the CNN-based model for sentence classification.

We'll cover the following...

We’re now ready to implement the model in TensorFlow 2. As a prerequisite, let’s import several necessary modules from TensorFlow:

import tensorflow.keras.backend as K
import tensorflow.keras.layers as layers
import tensorflow.keras.regularizers as regularizers
from tensorflow.keras.models import Model

Clear the running session to make sure previous runs are not interfering with the current run:

K.clear_session()

Before we start, we’ll be using the functional API from Keras. This is because the model we’ll be building here can’t be built with the sequential API due to the intricate pathways present in the model. Let’s start off by creating an input layer:

#Input layer takes word IDs as inputs
word_id_inputs = layers.Input(shape=(max_seq_length,), dtype='int32')

The input layer simply takes a batch of max_seq_length word IDs—that is, a batch of sequences, where each sequence is padded/truncated to a maximum length. We specify the dtype as int32 since they are word IDs. Next, we define an embedding layer from which we’ll look up embeddings corresponding to the word IDs coming through the word_id_inputs layer:

# Get the embeddings of the inputs / out [batch_size, sent_length, output_dim]
embedding_out = layers.Embedding(input_dim=n_vocab, output_dim=64)(word_id_inputs)

This is a randomly initialized embedding layer. It contains a large matrix of size [n_vocab, 64], where each row represents the word vector of the word indexed by that row number. The embeddings will be jointly learned with the model while the model is trained on the supervised task. For the next part, we’ll define three different one-dimensional convolution layers with three different kernel (filter) sizes of 3, 4, and 5, having 100 feature maps each:

# For all layers: in [batch_size, sent_length, emb_size] / out [batch_size, sent_length, 100]
conv1_1 = layers.Conv1D(100, kernel_size=3, strides=1, padding='same', activation='relu')(embedding_out)
conv1_2 = layers.Conv1D(100, kernel_size=4, strides=1, padding='same', activation='relu')(embedding_out)
conv1_3 = layers.Conv1D(100, kernel_size=5, strides=1, padding='same', activation='relu')(embedding_out)

An important distinction to make here is that we’re using 1D convolution as opposed to the ...