Introduction to OCR using Computer Vision's Read API

Learn about optical character recognition and how it works with Azure Computer Vision's Read API.

Introduction to OCR

The term OCR stands for Optical Character Recognition. Optical Character Recognition deals with the problem of recognizing all the different handwritten and printed characters. These characters can be converted into a machine-readable, digital data format. OCR consists of several sub-processes to perform this operation in an efficient and accurate manner. The sub-processes are:

  • Preprocessing of the image
  • Text localization
  • Character segmentation
  • Character recognition
  • Post processing

The processes mentioned in the above list could differ on a case by case basis, but these are the steps that would be needed to perform OCR on printed and handwritten characters.

The Read API

The Azure Computer Vision service provides a Read API. This API is used to extract text (both printed and handwritten) from images and multi-page PDFs. The Read API is designed in such a way that it can extract text from text-heavy and multi-page PDFs in an optimized manner.

Below is a snapshot taken from the official Microsoft Azure documentation to help us understand the functioning of this API with images and PDFs.

Get hands-on with 1200+ tech skills courses.