Building an OCR script for Images using Read API
Learn to extract text from Images using Computer Vision's Read API.
Introduction
We are going to build an OCR script which will use the Azure Computer Vision’s Read API to perform OCR on some sample images.
If you want to execute the code snippets mentioned in this chapter on your local machine, then you can visit the Appendix section where you can follow the steps to install the dependencies (python packages).
Implementing OCR
First let’s import all the required packages that we would need to complete our OCR functionality.
Importing the required packages
Let us first import all the required packages that we would need to complete our OCR functionality.
from azure.cognitiveservices.vision.computervision import ComputerVisionClientfrom azure.cognitiveservices.vision.computervision.models import OperationStatusCodesfrom msrest.authentication import CognitiveServicesCredentialsfrom PIL import Image, ImageDrawimport requestsfrom io import BytesIOimport shutilimport time
Authenticating and calling the read API
Now, once we have imported all the required packages, we need to authenticate the computer vision client by using our subscription key and endpoint. Once authentication is done, we can call the Read API. Here is the code:
client = ComputerVisionClient(computer_vision_endpoint,CognitiveServicesCredentials(computer_vision_key))image_url = "https://cdn.pixabay.com/photo/2016/04/07/19/08/motivational-1314505__340.jpg"read_response = client.read(image_url, raw=True)
-
From lines 1 to 4, we have used the
ComputerVisionClient
class to authenticate and create an instance of this class. We have passed the subscription key and the endpoint of our Azure Computer Vision resource as parameters to the constructor ofComputerVisionClient
class. -
In line 6, we define a URL that will be used to fetch the image (We can specify any URL that contains the image).
-
In line 8, we call the
read()
function using theclient
object that we just created and pass the image URL as its parameter.
Here, we are using an Image URL and
read()
function to extract the text from image, but if you have the image in your directory, then you can use theread_in_stream()
function. In the next lesson, we will use this function to read a PDF file and extract text out of it. In the same way, we can do it for images too.
Fetching the results from read API
Now, once we have called the Read API, the only step left is to read the JSON response from the API. Let’s see the code for this:
from azure.cognitiveservices.vision.computervision import ComputerVisionClientfrom azure.cognitiveservices.vision.computervision.models import OperationStatusCodesfrom msrest.authentication import CognitiveServicesCredentialsfrom PIL import Image, ImageDrawimport requestsfrom io import BytesIOimport shutilimport timeclient = ComputerVisionClient(computer_vision_endpoint,CognitiveServicesCredentials(computer_vision_key))image_url = "https://cdn.pixabay.com/photo/2016/04/07/19/08/motivational-1314505__340.jpg"response = client.read(image_url, raw=True)def get_coordinates(bounding_box):return ((bounding_box[-2], bounding_box[-1]), (bounding_box[-6], bounding_box[-5]))operation_location = response.headers["Operation-Location"]operation_id = operation_location.split("/")[-1]while True:result = client.get_read_result(operation_id)if result.status not in ['notStarted', 'running']:breaktime.sleep(1)if result.status == OperationStatusCodes.succeeded:response = requests.get(image_url)img = Image.open(BytesIO(response.content))draw = ImageDraw.Draw(img)for ocr_text in result.analyze_result.read_results:for line in ocr_text.lines:print(line.text)print(line.bounding_box)draw.rectangle(get_coordinates(line.bounding_box), outline='red')img.save("image.png")# Do not modify the below lines as these are required to view the output in the browser.shutil.copy2('./image.png', './output/image.png')
-
From lines 1-9, we have imported the required packages.
-
From lines 11-18, we authenticate and call the Read API.
-
In lines 20 and 21, we define a function
get_coordinates()
which accepts the returned from the computer vision service for the text identified from the images.bounding-box the co-ordinates where the text is present inside the image The bounding box is returned in the following order:
(bottom_left_x, bottom_left_y, bottom_right_x, bottom_right_y, top_right_x, top_right_y, top_left_x, top_left_y)
This means that the bounding box’s coordinates start from the bottom left and move in the counter-clockwise direction. Now, to create a rectangle over the text, we use the
ImageDraw()
function that needs the coordinates in the order:(top_left_x, top_left_y, bottom_right_x, bottom_right_y)
Hence, we are extracting the corresponding values from the response received by computer vision and then passing these four values to the
ImageDraw()
function. -
In line 23, we use the
response
object to get theOperation-Location
and then in line 6, we fetch the operation ID. This ID will be used to fetch the result from theread()
API. -
From lines 27 to 31, we run a
while
loop. Inside the loop we perform the following steps:- In line 28, we call the
get_read_result()
function using theclient
object and passing the operation ID as the parameter. - In line 29, we check if the status of that operation ID is not
running
ornotStarted
. This means that the operation has been completed and the result is ready and in line 11, we then break thewhile
loop. - In line 31, we give a pause of one second and then execute the loop until we get the operation completed.
- In line 28, we call the
-
In lines 34 and 35, we download the image from the URL so that we can draw the bounding boxes of the texts identified.
-
In line 37, we’re creating the
draw
object that will be used to create each bounding box. -
In lines 41 and 42, we print the text that is extracted by the Read API line by line. So, first we check if the status of our operation is success or not. Then in line 21, we run a loop to iterate over all the text that has been extracted and print the text with its corresponding bounding-box.
-
In line 43, we draw the rectangles using the bounding box data received from the computer vision service.
-
Finally, in line 45, we save the image after creating all the bounding boxes.
So, in this way, we can use the Computer Vision’s Read API to extract text from images.