Search⌘ K
AI Features

Building an OCR script for Images using Read API

Explore how to build an OCR script using Azure Computer Vision's Read API. Learn to authenticate the client, extract text from images, process JSON results, and draw bounding boxes around detected text. Gain practical experience with image input methods and handling API responses.

Introduction

We are going to build an OCR script which will use the Azure Computer Vision’s Read API to perform OCR on some sample images.

If you want to execute the code snippets mentioned in this chapter on your local machine, then you can visit the Appendix section where you can follow the steps to install the dependencies (python packages).

Implementing OCR

First let’s import all the required packages that we would need to complete our OCR functionality.

Importing the required packages

Let us first import all the required packages that we would need to complete our OCR functionality.

C++
from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from azure.cognitiveservices.vision.computervision.models import OperationStatusCodes
from msrest.authentication import CognitiveServicesCredentials
from PIL import Image, ImageDraw
import requests
from io import BytesIO
import shutil
import time

Authenticating and calling the read API

Now, once we have imported all the required packages, we need to authenticate the computer vision client by using our subscription key and endpoint. Once authentication is done, we can call the Read API. Here is the code:

C++
client = ComputerVisionClient(
computer_vision_endpoint,
CognitiveServicesCredentials(computer_vision_key)
)
image_url = "https://cdn.pixabay.com/photo/2016/04/07/19/08/motivational-1314505__340.jpg"
read_response = client.read(image_url, raw=True)
  • From lines 1 to 4, we have used the ComputerVisionClient class to authenticate and create an instance of this class. We have passed the subscription key and the endpoint of our Azure Computer Vision resource as parameters to the constructor of ComputerVisionClient class.

  • In line 6, we define a URL that will be used to fetch the image (We can specify any URL that contains the image).

  • In line 8, we call the read() function using the client object that we just created and pass the image URL as its parameter.

Here, we are using an Image URL and read() function to extract the text from image, but if you have the image in your directory, then you can use the read_in_stream() function. In the next lesson, we will use this function to read a PDF file and extract text out of it. In the same way, we can do it for images too.

Fetching the results from read API

After calling the Read API, the next step is to process the JSON response and extract all text from the image. We’ll also draw bounding boxes around the detected text.

C++
from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from azure.cognitiveservices.vision.computervision.models import OperationStatusCodes
from msrest.authentication import CognitiveServicesCredentials
from PIL import Image, ImageDraw
import time
import requests
from io import BytesIO
# Authenticate
client = ComputerVisionClient(computer_vision_endpoint, CognitiveServicesCredentials(computer_vision_key))
# Call the Read API (URL or local image)
image_url = "https://cdn.pixabay.com/photo/2016/04/07/19/08/motivational-1314505__340.jpg"
read_op = client.read(image_url, raw=True)
operation_location = read_op.headers["Operation-Location"]
operation_id = operation_location.split("/")[-1]
# Wait for the operation to complete
while True:
result = client.get_read_result(operation_id)
if result.status not in ["notStarted", "running"]:
break
time.sleep(1)
# Open image for drawing bounding boxes
response = requests.get(image_url)
img = Image.open(BytesIO(response.content))
draw = ImageDraw.Draw(img)
# Function to convert bounding box to PIL coordinates
def get_coordinates(bbox):
return [bbox[6], bbox[7], bbox[2], bbox[3]] # (top_left_x, top_left_y, bottom_right_x, bottom_right_y)
# Extract and draw all text
if result.status == OperationStatusCodes.succeeded:
for page in result.analyze_result.read_results:
for line in page.lines:
print(line.text) # print extracted text
coords = get_coordinates(line.bounding_box)
draw.rectangle(coords, outline="red", width=2)
# Save the image with bounding boxes
img.save("output_with_boxes.png")
print("Text extraction complete. Image saved as output_with_boxes.png")
  • From lines 1–9, we import all the required packages: ComputerVisionClient, OperationStatusCodes, CognitiveServicesCredentials, PIL, requests, time, and BytesIO.

  • From lines 11–13, we authenticate the Computer Vision client using our subscription key and endpoint, creating a client object.

  • In lines 15–18, we call the Read API on the image URL using the read() function and fetch the Operation-Location header. We extract the operation ID from this URL, which will be used to retrieve the results.

  • In lines 20–22, we define the function get_coordinates() which converts the bounding box returned by the Computer Vision service into the order expected by ImageDraw().

The bounding box from the API is returned as:

(bottom_left_x, bottom_left_y, bottom_right_x, bottom_right_y, top_right_x, top_right_y, top_left_x, top_left_y)

This order starts from the bottom left and moves counter-clockwise. ImageDraw.rectangle() requires the coordinates in this order:

(top_left_x, top_left_y, bottom_right_x, bottom_right_y)

The function extracts the corresponding values and returns them in the correct order for drawing rectangles.

  • From lines 25–31, we run a while loop to wait for the Read API operation to complete:

    • In line 26, we call get_read_result() with the operation ID.

    • In line 27, we check if the operation status is not "notStarted" or "running". Once the status is "succeeded", we break the loop.

    • In line 29, we pause for one second between requests to avoid overwhelming the service.

  • In lines 33–35, we download the image from the URL and open it using PIL.

  • In line 36, we create a draw object using ImageDraw.Draw(img) to allow drawing rectangles over the text.

  • From lines 39–43, we iterate over all pages and lines in result.analyze_result.read_results to:

    • Print each extracted line of text.

    • Extract bounding box coordinates using get_coordinates().

    • Draw rectangles around the text using draw.rectangle().

  • In line 45, we save the final image with all bounding boxes drawn.

So, using this approach, the Read API extracts all text from the image and visually highlights it, ensuring nothing is missed.