# The Deep Learning Environment Test

Understand the process of testing a deep learning environment, defining and training models, and calculating their loss and accuracy using provided datasets

We are going to verify our deep learning environment installation by building and training a simple, fully connected neural network to perform classification on images of handwritten digits from the MNIST dataset. MNIST is an introductory dataset that contains 70,000 images, therefore enabling us to quickly train a small model on a CPU and extremely fast on a GPU. In this simple example, we are only interested in testing our deep learning setup.

## Importing Keras libraries

We start by using the `keras`

built-in function to download and load the train and test datasets associated with MNIST:

from keras.datasets import mnistfrom keras.models import Sequentialfrom keras.utils import to_categoricalfrom keras.optimizers import SGDfrom keras.layers import Dense, Activation

## Loading and splitting dataset

The training set has 60,000 samples, and the test set has 10,000 samples. The dataset is balanced and shuffled, that is, it has a similar number of samples for each class, and the order of the samples is random:

(X_train, y_train), (X_test, y_test) = mnist.load_data()print("Train samples {}, Train labels {}".format(X_train.shape,y_train.shape))print("Test samples {}, Test labels {}".format(X_test.shape, y_test.shape))

## Reshaping and scaling data

The MNIST dataset contains images of **batch size**, that is, the number of items in the batch. After reshaping the data, we will convert it to floating point and scale it to

# reshape to batch size by height * widthh, w = X_train.shape[1:]X_train = X_train.reshape(X_train.shape[0], h * w)X_test = X_test.reshape(X_test.shape[0], h * w)X_train = X_train.astype('float32')X_test = X_test.astype('float32')# scale to [0, 1], scale to [0, 2], offset by -1X_train = (X_train / 255.0) * 2 - 1X_test = (X_test / 255.0) * 2 - 1

## One-hot encoding

For training our network, Keras requires each of our image labels to be in the one-hot representation. In the **one-hot representation**, we have a vector whose length is equal to the number of classes in which the index that represents the class associated with that label is

# convert class vectors to a matrix of one-hot vectorsn_classes = 10y_train = to_categorical(y_train, n_classes)y_test = to_categorical(y_test, n_classes)

## Compiling the model

After having prepared the data, we will define the parameters of our model and instantiate it. We now import from Keras the functions that are necessary to build and train a fully connected neural network. Whereas the `Dense`

class instantiates a dense layer, the `Sequential`

class allows us to connect these dense layers in a chain. Lastly, we import the stochastic gradient descent optimizer such that we can perform gradient descent on the loss given to the model to update the model parameters. We create a model with two hidden layers. The first layer projects the reshaped `h * w`

image input to 128 nodes, and the second layer projects the 128 nodes down to 10 nodes representing the number of classes in this problem:

n_hidden = 128model = Sequential()model.add(Dense(n_hidden, activation='tanh', input_dim=h * w))model.add(Dense(n_classes, activation='softmax'))model.summary()

## Compiling the model

After defining our model, we define the optimizer’s parameters that will be used to update the weights of our model given the `loss`

. We choose a small learning of `0.001`

and use the defaults for the other parameters:

sgd = SGD(learning_rate=0.001)

Finally, we compile the graph of our model setting the `loss`

function to `categorical_crossentropy`

, which is used in classification problems where each sample belongs to a single class. We use `accuracy`

as the reported metric because for this problem, we are interested in increasing the `accuracy`

metric of our model:

model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])

## Training the model

We train our model for as many epochs as necessary. One **epoch** is equivalent to a pass through the entire training data. The batch size is chosen such that it maximizes memory performance by maxing out memory footprint during training. This is very important, especially when using GPUs, such that our models use all the resources available in parallel:

model.fit(X_train, y_train, epochs=10, batch_size=128)

## Evaluating the model

After training the model, we can check whether our model is generalizing to data that it has not seen by looking at our model’s performance on the test data:

score = model.evaluate(X_test, y_test, batch_size=128)print(score)

The preceding code block generates the following output:

## Complete code

The complete code is given below. In it, we define our model and its parameters, train the model, and find its loss and accuracy.