Tesseract text detection in computer vision

Computer vision involves extracting information from visual data and allows us to perform complex tasks such as classification, prediction, recognition, and much more! In this Answer, we'll look at how to detect text using Tesseract in media, a classic optical character recognition application.

Optical character recognition

Optical character recognition (OCR), is a revolutionary technology that enables machines to interpret and convert images of text into machine-readable formats. It allows us to utilize the potential of printed or handwritten text.

Simply put, the goal of OCR is to convert the human perception of characters and convert them into machine-encoded text.

Recognition of digital or handwritten text
Recognition of digital or handwritten text

Text detection

The concept of optical character recognition is used in text detection, where we aim to identify and recognize the text found within an image or a video. We will look into its implementation and applications shortly.

Text detection
Text detection

Tesseract

In this Answer, we will perform text detection using a Python library named Tesseract. Tesseract is an open-source OCR engine developed by Google that allows the conversion of text in media to machine-encoded text and is known to be efficient and accurate.

Tesseract OCR logo
Tesseract OCR logo

Generic code walkthrough

We'll learn how to load an image, detect its text, and visualize it.

from pytesseract import *
import cv2
  • Let's start by importing the necessary modules.

    • cv2 is used for image processing

    • pytesseract is used for text detection

def process_image(image_path):
img = cv2.imread(image_path)
rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
save_text = image_to_data(rgb, output_type=Output.DICT)
  • The process_image function takes an image path as input and performs text detection on the image.

  • It reads the image using cv2.imread, converts it to RGB format using cv2.cvtColor, and then uses image_to_data from pytesseract to extract the text data in dictionary format. We do this so that multiple words can be catered too.

for i in range(0, len(save_text["text"])):
x = save_text["left"][i]
y = save_text["top"][i]
w = save_text["width"][i]
h = save_text["height"][i]
text = save_text["text"][i]
confidence_level = int(save_text["conf"][i])
  • We then create a loop to iterate over each detected text block in the image.

  • Next, we extract the bounding box coordinates i.e. (x, y) and dimensions i.e. (w,y) from the save_text dictionary. Along with that, we get the detected text and its confidence level for each text block.

if confidence_level > 75:
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 0, 0), 2)
(text_width, text_height), _ = cv2.getTextSize(text, cv2.FONT_HERSHEY_SIMPLEX, 0.6, 1)
cv2.rectangle(img, (x, y - text_height - 5), (x + text_width, y), (255, 255, 255), -1) cv2.putText(img, text, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 0, 0), 1)
  • Using the if statement, the code filters out weak confidence text. If the confidence_level is greater than 75 (can be changed), we can draw a rectangle around the detected text and put the text on the image using cv2.rectangle and cv2.putText with a black color and a white background.

return img
  • We finally return the image on which the text and boxes have been identified.

if __name__ == "__main__":
input_image_path = 'text7.png'
processed_image = process_image(input_image_path)
cv2.imshow("Image", processed_image)
cv2.waitKey(0)
  • Finally, we create our main function. The process_image function is called with the image path of our choice, and the processed image is displayed using cv2.imshow.

  • The window displaying the image is kept open until any key is pressed i.e. cv2.waitKey(0).

Text detection in images

Putting all the code together now, we can detect texts in images effectively.

from pytesseract import *
import cv2

def process_image(image_path):
    img = cv2.imread(image_path)
    rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    save_text = image_to_data(rgb, output_type=Output.DICT)

    for i in range(0, len(save_text["text"])):
        x = save_text["left"][i]
        y = save_text["top"][i]
        w = save_text["width"][i]
        h = save_text["height"][i]

        text = save_text["text"][i]
        confidence_level = int(save_text["conf"][i])

        if confidence_level > 75:
            cv2.rectangle(img, (x, y), (x + w, y + h), (0, 0, 0), 2)
            (text_width, text_height), _ = cv2.getTextSize(text, cv2.FONT_HERSHEY_SIMPLEX, 0.6, 1)
            cv2.rectangle(img, (x, y - text_height - 5), (x + text_width, y), (255, 255, 255), -1)
            cv2.putText(img, text, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 0, 0), 1)


    return img

if __name__ == "__main__":
    input_image_path = 'sample_img.png'
    processed_image = process_image(input_image_path)

    cv2.imshow("Image", processed_image)
    cv2.waitKey(0)

Text detection demonstration in images

Let's take a look at the output of the above code below. We can see how a box is drawn around the text, and the detected text is written above it.

Example one
Example one
Example two
Example two

Terminal output code

If you want to be able to copy the text once it is detected, you can print it on the terminal as well.

from pytesseract import *
import cv2

def process_image(image_path):
    img = cv2.imread(image_path)
    rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    save_text = image_to_data(rgb, output_type=Output.DICT)

    for i in range(0, len(save_text["text"])):
        x = save_text["left"][i]
        y = save_text["top"][i]
        w = save_text["width"][i]
        h = save_text["height"][i]

        text = save_text["text"][i]
        confidence_level = int(save_text["conf"][i])

        if confidence_level > 75:
            cv2.rectangle(img, (x, y), (x + w, y + h), (0, 0, 0), 2)
            (text_width, text_height), _ = cv2.getTextSize(text, cv2.FONT_HERSHEY_SIMPLEX, 0.6, 1)
            cv2.rectangle(img, (x, y - text_height - 5), (x + text_width, y), (255, 255, 255), -1)
            cv2.putText(img, text, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 0, 0), 1)
            
            print(f"Confidence: {confidence_level}")
            print(f"Text: {text}\n")

    return img

if __name__ == "__main__":
    input_image_path = 'sample_img.png'
    processed_image = process_image(input_image_path)

    cv2.imshow("Image", processed_image)
    cv2.waitKey(0)

Printing the text on the terminal

This is how the text is shown on the terminal, along with the confidence levels.

Output on the terminal as well
Output on the terminal as well

Text detection in videos

Using the same logic, we can even detect text in videos. This can be achieved by breaking down the video frame by frame and then applying the Tesseract detection on the frame. Due to the abrupt movements, this might not be as accurate as compared to when detecting text from images.

from pytesseract import *
import cv2

def process_image(image):
    rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    save_text = image_to_data(rgb, output_type=Output.DICT)

    for i in range(0, len(save_text["text"])):
        x = save_text["left"][i]
        y = save_text["top"][i]
        w = save_text["width"][i]
        h = save_text["height"][i]

        text = save_text["text"][i]
        confidence_level = int(save_text["conf"][i])

        if confidence_level > 75:
            cv2.rectangle(image, (x, y), (x + w, y + h), (0, 0, 0), 2)
            (text_width, text_height), _ = cv2.getTextSize(text, cv2.FONT_HERSHEY_SIMPLEX, 0.6, 1)
            cv2.rectangle(image, (x, y - text_height - 5), (x + text_width, y), (255, 255, 255), -1)
            cv2.putText(image, text, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 0, 0), 1)

    return image

def process_video(video_path):
    video = cv2.VideoCapture(video_path)

    while video.isOpened():
        ret, frame = video.read()
        if not ret:
            break

        processed_frame = process_image(frame)

        cv2.imshow('Video', processed_frame)

        if cv2.waitKey(1) == 27:
            break


    video.release()
    cv2.destroyAllWindows()


if __name__ == "__main__":
    input_video_url = 'https://player.vimeo.com/external/581763177.sd.mp4?s=7c0e1dbf0a173ca1c9c3ac37a05c2498f905ad11&profile_id=165&oauth2_token_id=57447761'
    process_video(input_video_url)

Text detection demonstration in videos

Let's see how the text is detected frame by frame for our video. You can replace the URL and try it out on your videos!

Test your knowledge of text detection!

Question

What does the image_to_data function do in Tesseract?

Show Answer
Copyright ©2024 Educative, Inc. All rights reserved