Search⌘ K
AI Features

Building an OCR script for Documents using Read API

Understand how to implement Optical Character Recognition (OCR) for PDF documents and images using Azure Computer Vision's Read API. Learn to authenticate, process document streams, and extract text along with bounding-box coordinates. Practice with Python to build scripts capable of reading text from local files for intelligent applications.

Introduction

In the previous lesson, we’ve seen how to extract the text from an image. Now, we’ll have a look at how to extract the text from a PDF document.

You can download the sample PDF that we are going to use in this lesson for extracting the text below:

printed_handwritten.pdf

Implementation for Documents

Now that you have the sample PDF we can move ahead to the implementation of this functionality.

C++
import time
from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from msrest.authentication import CognitiveServicesCredentials
from azure.cognitiveservices.vision.computervision.models import OperationStatusCodes
client = ComputerVisionClient(
computer_vision_endpoint,
CognitiveServicesCredentials(computer_vision_key)
)
def pdf_to_text():
filepath = open('CourseAssets/printed_handwritten.pdf','rb')
response = client.read_in_stream(filepath, raw=True)
filepath.close()
operation_location = response.headers["Operation-Location"]
operation_id = operation_location.split("/")[-1]
while True:
result = client.get_read_result(operation_id)
if result.status.lower () not in ['notstarted', 'running']:
break
time.sleep(10)
return result
result = pdf_to_text()
if result.status == OperationStatusCodes.succeeded:
for readResult in result.analyze_result.read_results:
for line in readResult.lines:
print(line.text)
print(line.bounding_box)
  • From lines 1 to 4, we’ve import the required packages. ...