Introduction: Image Captioning with Transformers

Explore how transformers revolutionize image captioning by integrating vision and text models. Understand data processing, model implementation, and training with TensorFlow using the MS-COCO dataset to generate accurate image descriptions.

We'll cover the following...

Applications of image captioning
Caption images using transformers
Chapter overview

Transformer models changed the playing field for many NLP problems. They have redefined the state of the art by a significant margin compared to the previous leaders: RNN-based models. We have already studied transformers and understand what makes them tick. Transformers have access to the whole sequence of items (e.g., a sequence of tokens), as opposed to RNN-based models that look at one item at a time, making them well suited for sequential problems. Following their success in the field of NLP, researchers have successfully used transformers to solve computer vision problems. Here, we’ll learn how to use transformers to solve a multimodal problem involving both images and text: image captioning. ...

1.Introduction to Natural Language Processing

2.Understanding TensorFlow 2

3.Word2vec: Learning Word Embeddings

4. Advanced Word Vector Algorithms

5.Sentence Classification with Convolutional Neural Networks

6.Recurrent Neural Networks

7.Understanding Long Short-Term Memory Networks

8.Applications of LSTM: Generating Text

9.Sequence-to-Sequence Learning: Neural Machine Translation

10.Transformers

Project

11.Image Captioning with Transformers

12.Final Remarks

13.Appendix: Mathematical Foundations and Advanced TensorFlow

Mock Interview

Introduction: Image Captioning with Transformers