Evaluating the Model and Generating Captions from It
Explore how to evaluate a trained transformer model for image captioning using accuracy and BLEU metrics, and understand the inference process to generate captions. Learn to create a caption generation function that predicts tokens step-by-step until completion, enabling practical application of the model on new images.
We'll cover the following...
Evaluating the model
With the model trained, let’s test the model on our unseen test dataset. Testing logic is almost identical to the validation logic we discussed earlier during model training. Therefore, we won’t repeat our discussion here.
bleu_metric = BLEUMetric(tokenizer=tokenizer)test_dataset, _ = generate_tf_dataset(test_captions_df, tokenizer=tokenizer, n_vocab=n_vocab, batch_size=batch_size, training=False)test_loss, test_accuracy, test_bleu = [], [], []for ti, t_batch in enumerate(test_dataset):print(f"{ti+1} batches processed", end='\r')loss, accuracy = full_model.test_on_batch(t_batch[0], t_batch[1])batch_predicted = full_model.predict_on_batch(t_batch[0])bleu_score = bleu_metric.calculate_bleu_from_predictions(t_batch[1],batch_predicted)test_loss.append(loss)test_accuracy.append(accuracy)test_bleu.append(bleu_score)print(f"\ntest_loss: {np.mean(test_loss)} - test_accuracy: {np.mean(test_accuracy)} - test_bleu: {np.mean(test_bleu)}")
This will output:
261 batches processedtest_loss: 1.057080413646625 - test_accuracy: 0.7914185857407434 - test_bleu: 0.10505496256163914
Great, we can see the model is showing a similar performance to what it did on the validation data. This means our model has not overfitted data and should perform reasonably well in the real world. Let’s now generate captions for a few sample images.
Captions generated for test images
With the help of metrics such as accuracy and BLEU, we have ensured our model is performing well. But, one of the most important tasks a trained model has to perform is generating outputs for new data. We’ll learn how we can ...