Building an OCR script for Images using Read API
Explore how to build an OCR script using Azure Computer Vision's Read API. Learn to authenticate the client, extract text from images, process JSON results, and draw bounding boxes around detected text. Gain practical experience with image input methods and handling API responses.
Introduction
We are going to build an OCR script which will use the Azure Computer Vision’s Read API to perform OCR on some sample images.
If you want to execute the code snippets mentioned in this chapter on your local machine, then you can visit the Appendix section where you can follow the steps to install the dependencies (python packages).
Implementing OCR
First let’s import all the required packages that we would need to complete our OCR functionality.
Importing the required packages
Let us first import all the required packages that we would need to complete our OCR functionality.
Authenticating and calling the read API
Now, once we have imported all the required packages, we need to authenticate the computer vision client by using our subscription key and endpoint. Once authentication is done, we can call the Read API. Here is the code:
-
From lines 1 to 4, we have used the
ComputerVisionClientclass to authenticate and create an instance of this class. We have passed the subscription key and the endpoint of our Azure Computer Vision resource as parameters to the constructor ofComputerVisionClientclass. -
In line 6, we define a URL that will be used to fetch the image (We can specify any URL that contains the image).
-
In line 8, we call the
read()function using theclientobject that we just created and pass the image URL as its parameter.
Here, we are using an Image URL and
read()function to extract the text from image, but if you have the image in your directory, then you can use theread_in_stream()function. In the next lesson, we will use this function to read a PDF file and extract text out of it. In the same way, we can do it for images too.
Fetching the results from read API
After calling the Read API, the next step is to process the JSON response and extract all text from the image. We’ll also draw bounding boxes around the detected text.
From lines 1–9, we import all the required packages:
ComputerVisionClient,OperationStatusCodes,CognitiveServicesCredentials,PIL,requests,time, andBytesIO.From lines 11–13, we authenticate the Computer Vision client using our subscription key and endpoint, creating a
clientobject.In lines 15–18, we call the Read API on the image URL using the
read()function and fetch theOperation-Locationheader. We extract the operation ID from this URL, which will be used to retrieve the results.In lines 20–22, we define the function
get_coordinates()which converts the bounding box returned by the Computer Vision service into the order expected byImageDraw().
The bounding box from the API is returned as:
(bottom_left_x, bottom_left_y, bottom_right_x, bottom_right_y, top_right_x, top_right_y, top_left_x, top_left_y)
This order starts from the bottom left and moves counter-clockwise. ImageDraw.rectangle() requires the coordinates in this order:
(top_left_x, top_left_y, bottom_right_x, bottom_right_y)
The function extracts the corresponding values and returns them in the correct order for drawing rectangles.
From lines 25–31, we run a
whileloop to wait for the Read API operation to complete:In line 26, we call
get_read_result()with the operation ID.In line 27, we check if the operation status is not
"notStarted"or"running". Once the status is"succeeded", we break the loop.In line 29, we pause for one second between requests to avoid overwhelming the service.
In lines 33–35, we download the image from the URL and open it using PIL.
In line 36, we create a
drawobject usingImageDraw.Draw(img)to allow drawing rectangles over the text.From lines 39–43, we iterate over all pages and lines in
result.analyze_result.read_resultsto:Print each extracted line of text.
Extract bounding box coordinates using
get_coordinates().Draw rectangles around the text using
draw.rectangle().
In line 45, we save the final image with all bounding boxes drawn.
So, using this approach, the Read API extracts all text from the image and visually highlights it, ensuring nothing is missed.