Introduction

We are going to build an OCR script which will use the Azure Computer Vision’s Read API to perform OCR on some sample images.

If you want to execute the code snippets mentioned in this chapter on your local machine, then you can visit the Appendix section where you can follow the steps to install the dependencies (python packages).

Implementing OCR

First let’s import all the required packages that we would need to complete our OCR functionality.

Importing the required packages

Let us first import all the required packages that we would need to complete our OCR functionality.

From lines 1 to 4, we have used the ComputerVisionClient class to authenticate and create an instance of this class. We have passed the subscription key and the endpoint of our Azure Computer Vision resource as parameters to the constructor of ComputerVisionClient class.
In line 6, we define a URL that will be used to fetch the image (We can specify any URL that contains the image).
In line 8, we call the read() function using the client object that we just created and pass the image URL as its parameter.

Here, we are using an Image URL and read() function to extract the text from image, but if you have the image in your directory, then you can use the read_in_stream() function. In the next lesson, we will use this function to read a PDF file and extract text out of it. In the same way, we can do it for images too.

Fetching the results from read API

Now, once we have called the Read API, the only step left is to read the JSON response from the API. Let’s see the code for this:

C++

from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from azure.cognitiveservices.vision.computervision.models import OperationStatusCodes
from msrest.authentication import CognitiveServicesCredentials
from PIL import Image, ImageDraw
import requests
from io import BytesIO
import shutil
import time
client = ComputerVisionClient(
                computer_vision_endpoint, 
                CognitiveServicesCredentials(computer_vision_key)
            )
image_url = "https://cdn.pixabay.com/photo/2016/04/07/19/08/motivational-1314505__340.jpg"
response = client.read(image_url,  raw=True)
def get_coordinates(bounding_box):
    return ((bounding_box[-2], bounding_box[-1]), (bounding_box[-6], bounding_box[-5]))
operation_location = response.headers["Operation-Location"]
operation_id = operation_location.split("/")[-1]
while True:
    result = client.get_read_result(operation_id)
    if result.status not in ['notStarted', 'running']:
        break
    time.sleep(1)
if result.status == OperationStatusCodes.succeeded:
    response = requests.get(image_url)
    img = Image.open(BytesIO(response.content))
    draw = ImageDraw.Draw(img)
    for ocr_text in result.analyze_result.read_results:
        for line in ocr_text.lines:
            print(line.text)
            print(line.bounding_box)
            draw.rectangle(get_coordinates(line.bounding_box), outline='red')
    img.save("image.png")
    
    # Do not modify the below lines as these are required to view the output in the browser.
    shutil.copy2('./image.png', './output/image.png')

From lines 1-9, we have imported the required packages.
From lines 11-18, we authenticate and call the Read API.
In lines 20 and 21, we define a function get_coordinates() which accepts the bounding-boxthe co-ordinates where the text is present inside the image returned from the computer vision service for the text identified from the images.

The bounding box is returned in the following order:
```
(bottom_left_x, bottom_left_y, bottom_right_x, bottom_right_y, top_right_x, top_right_y, top_left_x, top_left_y)
```
This means that the bounding box’s coordinates start from the bottom left and move in the counter-clockwise direction. Now, to create a rectangle over the text, we use the ImageDraw() function that needs the coordinates in the order:
```
(top_left_x, top_left_y, bottom_right_x, bottom_right_y)
```
Hence, we are extracting the corresponding values from the response received by computer vision and then passing these four values to the ImageDraw() function.
In line 23, we use the response object to get the Operation-Location and then in line 6, we fetch the operation ID. This ID will be used to fetch the result from the read() API.
From lines 27 to 31, we run a while loop. Inside the loop we perform the following steps:
- In line 28, we call the get_read_result() function using the client object and passing the operation ID as the parameter.
- In line 29, we check if the status of that operation ID is not running or notStarted. This means that the operation has been completed and the result is ready and in line 11, we then break the while loop.
- In line 31, we give a pause of one second and then execute the loop until we get the operation completed.
In lines 34 and 35, we download the image from the URL so that we can draw the bounding boxes of the texts identified.
In line 37, we’re creating the draw object that will be used to create each bounding box.
In lines 41 and 42, we print the text that is extracted by the Read API line by line. So, first we check if the status of our operation is success or not. Then in line 21, we run a loop to iterate over all the text that has been extracted and print the text with its corresponding bounding-box.
In line 43, we draw the rectangles using the bounding box data received from the computer vision service.
Finally, in line 45, we save the image after creating all the bounding boxes.

So, in this way, we can use the Computer Vision’s Read API to extract text from images.

Refresher to FastAPI - Python Web Framework

Introduction to Cloud and Microsoft Azure

Azure Vision Cognitive Services: Computer Vision

Azure Vision Cognitive Services: Custom Vision

Azure Vision Cognitive Services: Face API

Capstone Project 1: Building a Face Mask Classifier

Azure Vision Cognitive Services: Assessment

Azure Language Cognitive Services: LUIS

Capstone Project 2: Building a Weather Application Using LUIS

Azure Language Cognitive Services: QnA Maker

Capstone Project 3: Building a Chatbot Using Azure QnA Maker

Azure Language Cognitive Services: Text Analytics

Azure Language Cognitive Services: Translator

Azure Language Cognitive Services: Assessment

Azure Decision Cognitive Services: Anomaly Detection

Azure Decision Cognitive Services: Content Moderator

Azure Decision Cognitive Services: Personalizer

Azure Decision Cognitive Services: Assessment

Azure Speech Cognitive Services

Azure Bing Search Services

Azure Speech and Bing Search Services: Assessment

Appendix

Conclusion

Building an OCR script for Images using Read API

Introduction

Implementing OCR

Importing the required packages

Authenticating and calling the read API

Fetching the results from read API