Generate Captions and Tags for an Image

Learn how to generate meaningful captions and associate tags for your images using the Azure Computer Vision service.


In this lesson, we’re going to have a look at the first image analysis application using the Computer Vision API. We’re going to use an image from the internet and then pass it to our Computer Vision resource. The resource will provide a description of that image based on the visual features present inside the image.


To work with these lessons in this chapter, the following python package dependencies are required:

  • azure-cognitiveservices-vision-computervision
  • pillow

To know how to install these dependencies, refer to the Appendix section.


We’ll be using the describe_image() function from the ComputerVisionClient class. This function accepts the URL of an image and helps to generate the description/caption for an image. This description is generated based on the tags that are identified by the computer vision service. Also, more than one description can be generated and returned as a response from the computer vision service. The response is in the JSON format and all the descriptions are sorted in descending order of their confidence scores. If an error occurs on the computer vision end, this function will return an error code with an error message.

Later in the lesson, we’ll also explore the describe_image_in_stream() function. This function works the same way that describe_image() works. The only difference is that the describe_image_in_stream() function accepts an image in bytes format. So, we can read the image from the local machine and pass the image data to this function to generate the description.

Now let’s go ahead and implement this functionality.

Get hands-on with 1200+ tech skills courses.