Tesseract text detection in computer vision
Computer vision involves extracting information from visual data and allows us to perform complex tasks such as classification, prediction, recognition, and much more! In this Answer, we'll look at how to detect text using Tesseract in media, a classic optical character recognition application.
Optical character recognition
Optical character recognition (OCR), is a revolutionary technology that enables machines to interpret and convert images of text into machine-readable formats. It allows us to utilize the potential of printed or handwritten text.
Simply put, the goal of OCR is to convert the human perception of characters and convert them into machine-encoded text.
Text detection
The concept of optical character recognition is used in text detection, where we aim to identify and recognize the text found within an image or a video. We will look into its implementation and applications shortly.
Tesseract
In this Answer, we will perform text detection using a Python library named Tesseract. Tesseract is an open-source OCR engine developed by Google that allows the conversion of text in media to machine-encoded text and is known to be efficient and accurate.
Generic code walkthrough
We'll learn how to load an image, detect its text, and visualize it.
from pytesseract import *import cv2
Let's start by importing the necessary modules.
cv2is used for image processingpytesseractis used for text detection
def process_image(image_path):img = cv2.imread(image_path)rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)save_text = image_to_data(rgb, output_type=Output.DICT)
The
process_imagefunction takes an image path as input and performs text detection on the image.It reads the image using
cv2.imread, converts it to RGB format usingcv2.cvtColor, and then usesimage_to_datafrompytesseractto extract the text data in dictionary format. We do this so that multiple words can be catered too.
for i in range(0, len(save_text["text"])):x = save_text["left"][i]y = save_text["top"][i]w = save_text["width"][i]h = save_text["height"][i]text = save_text["text"][i]confidence_level = int(save_text["conf"][i])
We then create a loop to iterate over each detected text block in the image.
Next, we extract the bounding box coordinates i.e. (x, y) and dimensions i.e. (w,y) from the
save_textdictionary. Along with that, we get the detected text and its confidence level for each text block.
if confidence_level > 75:cv2.rectangle(img, (x, y), (x + w, y + h), (0, 0, 0), 2)(text_width, text_height), _ = cv2.getTextSize(text, cv2.FONT_HERSHEY_SIMPLEX, 0.6, 1)cv2.rectangle(img, (x, y - text_height - 5), (x + text_width, y), (255, 255, 255), -1) cv2.putText(img, text, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 0, 0), 1)
Using the
ifstatement, the code filters out weak confidence text. If theconfidence_levelis greater than 75 (can be changed), we can draw a rectangle around the detected text and put the text on the image usingcv2.rectangleandcv2.putTextwith a black color and a white background.
return img
We finally return the image on which the text and boxes have been identified.
if __name__ == "__main__":input_image_path = 'text7.png'processed_image = process_image(input_image_path)cv2.imshow("Image", processed_image)cv2.waitKey(0)
Finally, we create our
mainfunction. Theprocess_imagefunction is called with the image path of our choice, and the processed image is displayed usingcv2.imshow.The window displaying the image is kept open until any key is pressed i.e.
cv2.waitKey(0).
Text detection in images
Putting all the code together now, we can detect texts in images effectively.
from pytesseract import *
import cv2
def process_image(image_path):
img = cv2.imread(image_path)
rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
save_text = image_to_data(rgb, output_type=Output.DICT)
for i in range(0, len(save_text["text"])):
x = save_text["left"][i]
y = save_text["top"][i]
w = save_text["width"][i]
h = save_text["height"][i]
text = save_text["text"][i]
confidence_level = int(save_text["conf"][i])
if confidence_level > 75:
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 0, 0), 2)
(text_width, text_height), _ = cv2.getTextSize(text, cv2.FONT_HERSHEY_SIMPLEX, 0.6, 1)
cv2.rectangle(img, (x, y - text_height - 5), (x + text_width, y), (255, 255, 255), -1)
cv2.putText(img, text, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 0, 0), 1)
return img
if __name__ == "__main__":
input_image_path = 'sample_img.png'
processed_image = process_image(input_image_path)
cv2.imshow("Image", processed_image)
cv2.waitKey(0)
Text detection demonstration in images
Let's take a look at the output of the above code below. We can see how a box is drawn around the text, and the detected text is written above it.
Terminal output code
If you want to be able to copy the text once it is detected, you can print it on the terminal as well.
from pytesseract import *
import cv2
def process_image(image_path):
img = cv2.imread(image_path)
rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
save_text = image_to_data(rgb, output_type=Output.DICT)
for i in range(0, len(save_text["text"])):
x = save_text["left"][i]
y = save_text["top"][i]
w = save_text["width"][i]
h = save_text["height"][i]
text = save_text["text"][i]
confidence_level = int(save_text["conf"][i])
if confidence_level > 75:
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 0, 0), 2)
(text_width, text_height), _ = cv2.getTextSize(text, cv2.FONT_HERSHEY_SIMPLEX, 0.6, 1)
cv2.rectangle(img, (x, y - text_height - 5), (x + text_width, y), (255, 255, 255), -1)
cv2.putText(img, text, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 0, 0), 1)
print(f"Confidence: {confidence_level}")
print(f"Text: {text}\n")
return img
if __name__ == "__main__":
input_image_path = 'sample_img.png'
processed_image = process_image(input_image_path)
cv2.imshow("Image", processed_image)
cv2.waitKey(0)
Printing the text on the terminal
This is how the text is shown on the terminal, along with the confidence levels.
Text detection in videos
Using the same logic, we can even detect text in videos. This can be achieved by breaking down the video frame by frame and then applying the Tesseract detection on the frame. Due to the abrupt movements, this might not be as accurate as compared to when detecting text from images.
from pytesseract import *
import cv2
def process_image(image):
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
save_text = image_to_data(rgb, output_type=Output.DICT)
for i in range(0, len(save_text["text"])):
x = save_text["left"][i]
y = save_text["top"][i]
w = save_text["width"][i]
h = save_text["height"][i]
text = save_text["text"][i]
confidence_level = int(save_text["conf"][i])
if confidence_level > 75:
cv2.rectangle(image, (x, y), (x + w, y + h), (0, 0, 0), 2)
(text_width, text_height), _ = cv2.getTextSize(text, cv2.FONT_HERSHEY_SIMPLEX, 0.6, 1)
cv2.rectangle(image, (x, y - text_height - 5), (x + text_width, y), (255, 255, 255), -1)
cv2.putText(image, text, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 0, 0), 1)
return image
def process_video(video_path):
video = cv2.VideoCapture(video_path)
while video.isOpened():
ret, frame = video.read()
if not ret:
break
processed_frame = process_image(frame)
cv2.imshow('Video', processed_frame)
if cv2.waitKey(1) == 27:
break
video.release()
cv2.destroyAllWindows()
if __name__ == "__main__":
input_video_url = 'https://player.vimeo.com/external/581763177.sd.mp4?s=7c0e1dbf0a173ca1c9c3ac37a05c2498f905ad11&profile_id=165&oauth2_token_id=57447761'
process_video(input_video_url)
Text detection demonstration in videos
Let's see how the text is detected frame by frame for our video. You can replace the URL and try it out on your videos!
What does the image_to_data function do in Tesseract?
Free Resources