Search⌘ K
AI Features

Building an OCR script for Images using Read API

Explore how to implement an OCR script utilizing Azure Computer Vision's Read API. Learn to authenticate, call the API, process JSON results, and draw bounding boxes around extracted text on images.

Introduction

We are going to build an OCR script which will use the Azure Computer Vision’s Read API to perform OCR on some sample images.

If you want to execute the code snippets mentioned in this chapter on your local machine, then you can visit the Appendix section where you can follow the steps to install the dependencies (python packages).

Implementing OCR

First let’s import all the required packages that we would need to complete our OCR functionality.

Importing the required packages

Let us first import all the required packages that we would need to complete our OCR functionality.

C++
from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from azure.cognitiveservices.vision.computervision.models import OperationStatusCodes
from msrest.authentication import CognitiveServicesCredentials
from PIL import Image, ImageDraw
import requests
from io import BytesIO
import shutil
import time

Authenticating and calling the read API

Now, once we have imported all the required packages, we need to authenticate the computer vision client by using our subscription key and endpoint. Once authentication is done, we can call the Read API. Here is the code:

C++
client = ComputerVisionClient(
computer_vision_endpoint,
CognitiveServicesCredentials(computer_vision_key)
)
image_url = "https://cdn.pixabay.com/photo/2016/04/07/19/08/motivational-1314505__340.jpg"
read_response = client.read(image_url, raw=True)
  • From lines 1 to 4, ...