Sign language translator using OpenCV
Sign language is a crucial communication tool for individuals with hearing impairments. In this Answer, we’ll explore how to create a real-time hand gesture recognition application, known as the sign language converter, using Python and the Tkinter library for graphical user interface (GUI) design. This application bridges the gap between sign language and text, allowing communication between those who use sign language and those who may not understand it.
Technologies used
Before diving into the code, let’s understand the key technologies and libraries that we will use.
Tkinter: A GUI library in Python that provides a set of tools for creating interactive graphical user interfaces.
OpenCV: An open-source computer vision library that allows us to work with images and videos and perform various image-processing tasks.
Mediapipe: A library developed by Google that provides ready-to-use solutions for various tasks, including hand tracking and pose estimation.
Pyttsx3: A text-to-speech conversion library that enables the application to provide audio feedback.
Setting up the GUI
We begin by importing the necessary libraries and creating the main application window with appropriate dimensions, background color, and title.
from tkinter import *from PIL import Image, ImageTkimport cv2from tkinter import filedialogimport mediapipe as mpimport pyttsx3win = Tk()width = win.winfo_screenwidth()height = win.winfo_screenheight()win.geometry("%dx%d" % (width, height))win.configure(bg="#FFFFF7")win.title('Sign Language Converter')
Defining global variables
Several global variables are defined to store various elements of the application, such as images, hand-tracking results, GUI components, and more.
global img, finalImage, finger_tips, thumb_tip, cap, image, rgb, hand, results, _, w, h, status, mpDraw, mpHands, hands, label1, btn, btn2
Initializing hand detection
The wine function initializes the hand detection setup, configuring webcam access using OpenCV’s VideoCapture and setting up the Hands object from the Mediapipe library.
def wine():global finger_tips,thumb_tip,mpDraw,mpHands,cap,w,h,hands,label1,label1,check,imgfinger_tips = [8, 12, 16, 20]thumb_tip = 4w = 500h = 400label1 = Label(win, width=w, height=h,bg="#FFFFF7")label1.place(x=40, y=200)mpHands = mp.solutions.hands # From differenthands = mpHands.Hands() # The hands object from Hands SolutionmpDraw = mp.solutions.drawing_utilscap = cv2.VideoCapture(0)
Gesture recognition and interpretation
The live function processes webcam frames, detects hand landmarks using Mediapipe, and interprets hand gestures based on finger positions and orientations. We will see the gestures like “STOP”, “OKAY”, “VICTORY”, and more, which will be recognized based on the position of landmarks.
def live():global vglobal upCountglobal cshow,imgcshow=0upCount = StringVar()_, img = cap.read()img = cv2.resize(img, (w, h))rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)results = hands.process(rgb)if results.multi_hand_landmarks:for hand in results.multi_hand_landmarks:lm_list = []for id, lm in enumerate(hand.landmark):lm_list.append(lm)finger_fold_status = []for tip in finger_tips:x, y = int(lm_list[tip].x * w), int(lm_list[tip].y * h)if lm_list[tip].x < lm_list[tip - 2].x:finger_fold_status.append(True)else:finger_fold_status.append(False)print(finger_fold_status)x, y = int(lm_list[8].x * w), int(lm_list[8].y * h)print(x, y)# stopif lm_list[4].y < lm_list[2].y and lm_list[8].y < lm_list[6].y and lm_list[12].y < lm_list[10].y and \lm_list[16].y < lm_list[14].y and lm_list[20].y < lm_list[18].y and lm_list[17].x < lm_list[0].x < \lm_list[5].x:cshow = 'STOP ! Dont move.'upCount.set('STOP ! Dont move.')print('STOP ! Dont move.')# okayelif lm_list[4].y < lm_list[2].y and lm_list[8].y > lm_list[6].y and lm_list[12].y < lm_list[10].y and \lm_list[16].y < lm_list[14].y and lm_list[20].y < lm_list[18].y and lm_list[17].x < lm_list[0].x < \lm_list[5].x:cshow = 'Perfect , You did a great job.'print('Perfect , You did a great job.')upCount.set('Perfect , You did a great job.')# spideyelif lm_list[4].y < lm_list[2].y and lm_list[8].y < lm_list[6].y and lm_list[12].y > lm_list[10].y and \lm_list[16].y > lm_list[14].y and lm_list[20].y < lm_list[18].y and lm_list[17].x < lm_list[0].x < \lm_list[5].x:cshow = 'Good to see you.'print(' Good to see you. ')upCount.set('Good to see you.')# Pointelif lm_list[8].y < lm_list[6].y and lm_list[12].y > lm_list[10].y and \lm_list[16].y > lm_list[14].y and lm_list[20].y > lm_list[18].y:upCount.set('You Come here.')print("You Come here.")cshow = 'You Come here.'# Victoryelif lm_list[8].y < lm_list[6].y and lm_list[12].y < lm_list[10].y and \lm_list[16].y > lm_list[14].y and lm_list[20].y > lm_list[18].y:upCount.set('Yes , we won.')print("Yes , we won.")cshow = 'Yes , we won.'# Leftelif lm_list[4].y < lm_list[2].y and lm_list[8].x < lm_list[6].x and lm_list[12].x > lm_list[10].x and \lm_list[16].x > lm_list[14].x and lm_list[20].x > lm_list[18].x and lm_list[5].x < lm_list[0].x:upCount.set('Move Left')print(" MOVE LEFT")cshow = 'Move Left'# Rightelif lm_list[4].y < lm_list[2].y and lm_list[8].x > lm_list[6].x and lm_list[12].x < lm_list[10].x and \lm_list[16].x < lm_list[14].x and lm_list[20].x < lm_list[18].x:upCount.set('Move Right')print("Move RIGHT")cshow = 'Move Right'if all(finger_fold_status):# likeif lm_list[thumb_tip].y < lm_list[thumb_tip - 1].y < lm_list[thumb_tip - 2].y and lm_list[0].x < lm_list[3].y:print("I like it")upCount.set('I Like it')cshow = 'I Like it'# Dislikeelif lm_list[thumb_tip].y > lm_list[thumb_tip - 1].y > lm_list[thumb_tip - 2].y and lm_list[0].x < lm_list[3].y:upCount.set('I dont like it.')print(" I dont like it.")cshow = 'I dont like it.'mpDraw.draw_landmarks(rgb, hand, mpHands.HAND_CONNECTIONS)cv2.putText(rgb, f'{cshow}', (10, 50),cv2.FONT_HERSHEY_COMPLEX, .75, (0, 255, 255), 2)image = Image.fromarray(rgb)finalImage = ImageTk.PhotoImage(image)label1.configure(image=finalImage)label1.image = finalImagewin.after(1, live)crr=Label(win,text='Current Status :',font=('Helvetica',18,'bold'),bd=5,bg='gray',width=15,fg='#232224',relief=GROOVE )status = Label(win,textvariable=upCount,font=('Helvetica',18,'bold'),bd=5,bg='gray',width=50,fg='#232224',relief=GROOVE )status.place(x=400,y=700)crr.place(x=120,y=700)
Integrating voice feedback
The voice function uses the Pyttsx3 library to provide audio feedback. When called, it converts the recognized gesture message into speech and plays it using the system’s default audio output.
def voice():engine = pyttsx3.init()engine.say((upCount.get()))engine.runAndWait()
Video playback and gesture recognition
The video function allows us to load external video files. It opens a file dialog using the filedialog library, captures video frames from the selected video file, and calls the live function to perform gesture recognition on the video frames.
def video():global cap, ex, label1filename = filedialog.askopenfilename(initialdir="/", title="Select file ",filetypes=(("mp4 files", ".mp4"), ("all files", ".")))cap = cv2.VideoCapture(filename)live()
Adding flexibility with widgets
Buttons are added to the GUI using Tkinter. These buttons provide various functionalities like switching between live video and loaded video, enabling audio feedback, changing the webcam source, and exiting the application.
Button(win, text='Live', ... , command=live).place(x=width-250, y=350)Button(win, text='Video', ... , command=video).place(x=width-250, y=400)Button(win, text='Sound', ... , command=voice).place(x=width-250, y=450)Button(win, text='Change Vid', ... , command=lbl).place(x=width-250, y=500)Button(win, text='Change Cam', ... , command=lbl2).place(x=width-250, y=550)Button(win, text='Exit', ... , command=win.destroy).place(x=width-250, y=600)
Creating text label
A label is created to display the current gesture status on the GUI. The textvariable attribute dynamically updates the label’s text with the recognized gesture message, providing real-time visual feedback.
Label(win, textvariable=upCount, ... ).place(x=400, y=700)
Implementation
Let's see now how the project works like by running it.
from tkinter import *
from PIL import Image, ImageTk
import cv2
from tkinter import filedialog
import mediapipe as mp
import pyttsx3
win = Tk()
width=win.winfo_screenwidth()
height=win.winfo_screenheight()
win.geometry("%dx%d" % (width, height))
win.configure(bg="#FFFFF7")
win.title('Sign Language Converter')
global img,finalImage,finger_tips,thumb_tip,cap, image, rgb, hand, results, _, w, \
h,status,mpDraw,mpHands,hands,label1,btn,btn2
cap=None
Label(win,text='Sign Language Converter',font=('Helvatica',18,'italic'),bd=5,bg='#199ef3',fg='white',relief=SOLID,width=200 )\
.pack(pady=15,padx=300)
def wine():
global finger_tips, thumb_tip, mpDraw, mpHands, cap, w, h, hands, label1, check, img
finger_tips = [8, 12, 16, 20]
thumb_tip = 4
w = 500
h = 400
if cap:
cap.release() # Release the previous video capture
label1 = Label(win, width=w, height=h, bg="#FFFFF7")
label1.place(x=40, y=200)
mpHands = mp.solutions.hands
hands = mpHands.Hands()
mpDraw = mp.solutions.drawing_utils
cap = cv2.VideoCapture(0)
###########################################Detection##########################################
def live():
global v
global upCount
global cshow,img
cshow=0
upCount = StringVar()
_, img = cap.read()
img = cv2.resize(img, (w, h))
rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
results = hands.process(rgb)
if results.multi_hand_landmarks:
for hand in results.multi_hand_landmarks:
lm_list = []
for id, lm in enumerate(hand.landmark):
lm_list.append(lm)
finger_fold_status = []
for tip in finger_tips:
x, y = int(lm_list[tip].x * w), int(lm_list[tip].y * h)
if lm_list[tip].x < lm_list[tip - 2].x:
finger_fold_status.append(True)
else:
finger_fold_status.append(False)
print(finger_fold_status)
x, y = int(lm_list[8].x * w), int(lm_list[8].y * h)
print(x, y)
# stop
if lm_list[4].y < lm_list[2].y and lm_list[8].y < lm_list[6].y and lm_list[12].y < lm_list[10].y and \
lm_list[16].y < lm_list[14].y and lm_list[20].y < lm_list[18].y and lm_list[17].x < lm_list[0].x < \
lm_list[5].x:
cshow = 'STOP ! Dont move.'
upCount.set('STOP ! Dont move.')
print('STOP ! Dont move.')
# okay
elif lm_list[4].y < lm_list[2].y and lm_list[8].y > lm_list[6].y and lm_list[12].y < lm_list[10].y and \
lm_list[16].y < lm_list[14].y and lm_list[20].y < lm_list[18].y and lm_list[17].x < lm_list[0].x < \
lm_list[5].x:
cshow = 'Perfect , You did a great job.'
print('Perfect , You did a great job.')
upCount.set('Perfect , You did a great job.')
# spidey
elif lm_list[4].y < lm_list[2].y and lm_list[8].y < lm_list[6].y and lm_list[12].y > lm_list[10].y and \
lm_list[16].y > lm_list[14].y and lm_list[20].y < lm_list[18].y and lm_list[17].x < lm_list[0].x < \
lm_list[5].x:
cshow = 'Good to see you.'
print(' Good to see you. ')
upCount.set('Good to see you.')
# Point
elif lm_list[8].y < lm_list[6].y and lm_list[12].y > lm_list[10].y and \
lm_list[16].y > lm_list[14].y and lm_list[20].y > lm_list[18].y:
upCount.set('You Come here.')
print("You Come here.")
cshow = 'You Come here.'
# Victory
elif lm_list[8].y < lm_list[6].y and lm_list[12].y < lm_list[10].y and \
lm_list[16].y > lm_list[14].y and lm_list[20].y > lm_list[18].y:
upCount.set('Yes , we won.')
print("Yes , we won.")
cshow = 'Yes , we won.'
# Left
elif lm_list[4].y < lm_list[2].y and lm_list[8].x < lm_list[6].x and lm_list[12].x > lm_list[10].x and \
lm_list[16].x > lm_list[14].x and lm_list[20].x > lm_list[18].x and lm_list[5].x < lm_list[0].x:
upCount.set('Move Left')
print(" MOVE LEFT")
cshow = 'Move Left'
# Right
elif lm_list[4].y < lm_list[2].y and lm_list[8].x > lm_list[6].x and lm_list[12].x < lm_list[10].x and \
lm_list[16].x < lm_list[14].x and lm_list[20].x < lm_list[18].x:
upCount.set('Move Right')
print("Move RIGHT")
cshow = 'Move Right'
if all(finger_fold_status):
# like
if lm_list[thumb_tip].y < lm_list[thumb_tip - 1].y < lm_list[thumb_tip - 2].y and lm_list[0].x < lm_list[3].y:
print("I like it")
upCount.set('I Like it')
cshow = 'I Like it'
# Dislike
elif lm_list[thumb_tip].y > lm_list[thumb_tip - 1].y > lm_list[thumb_tip - 2].y and lm_list[0].x < lm_list[3].y:
upCount.set('I dont like it.')
print(" I dont like it.")
cshow = 'I dont like it.'
mpDraw.draw_landmarks(rgb, hand, mpHands.HAND_CONNECTIONS)
cv2.putText(rgb, f'{cshow}', (10, 50),
cv2.FONT_HERSHEY_COMPLEX, .75, (0, 255, 255), 2)
image = Image.fromarray(rgb)
finalImage = ImageTk.PhotoImage(image)
label1.configure(image=finalImage)
label1.image = finalImage
win.after(1, live)
crr=Label(win,text='Current Status :',font=('Helvetica',18,'bold'),bd=5,bg='gray',width=15,fg='#232224',relief=GROOVE )
status = Label(win,textvariable=upCount,font=('Helvetica',18,'bold'),bd=5,bg='gray',width=50,fg='#232224',relief=GROOVE )
status.place(x=400,y=700)
crr.place(x=120,y=700)
def voice():
engine = pyttsx3.init()
engine.say((upCount.get()))
engine.runAndWait()
def video():
global cap, ex, label1
if cap:
cap.release() # Release the previous video capture
filename = filedialog.askopenfilename(initialdir="/", title="Select file ",
filetypes=(("mp4 files", ".mp4"), ("all files", ".")))
cap = cv2.VideoCapture(filename)
w = 500
h = 400
label1 = Label(win, width=w, height=h, relief=GROOVE)
label1.place(x=40, y=200)
live()
wine()
Button(win, text='Live',padx=95,bg='#199ef3',fg='white',relief=RAISED ,width=1,bd=5,font=('Helvatica',12,'bold'),command=live)\
.place(x=width-250,y=400)
Button(win, text='Video',padx=95,bg='#199ef3',fg='white',relief=RAISED,width=1,bd=5,font=('Helvatica',12,'bold') ,command= video)\
.place(x=width-250,y=450)
Button(win,text='Sound',padx=95,bg='#199ef3',fg='white',relief=RAISED,width=1,bd=5,font=('Helvatica',12,'bold') ,command=voice)\
.place(x=width-250,y=500)
Button(win,text='Change Vid',padx=95,bg='#199ef3',fg='white',relief=RAISED ,width=1,bd=5,font=('Helvatica',12,'bold'),command=video)\
.place(x=width-250,y=550)
Button(win,text='Exit',padx=95,bg='#199ef3',fg='white',relief=RAISED,width=1,bd=5,font=('Helvatica',12,'bold') ,command=win.destroy)\
.place(x=width-250,y=600)
win.mainloop()Due to the docker environment restrictions, the live functionality may not work here; however, running the application locally should function correctly.
Output
This is how the live button works.
And for the video case, it extracts frames from the selected video file and employs the live function to assess these frames for the purpose of recognizing gestures.
Conclusion
This project shows how technology can bring people together, no matter how they communicate. By using Python, Tkinter, OpenCV, Mediapipe, and Pyttsx3, we've built something that shows how software can create positive change and understanding between different ways of talking.
However, it's important to realize that sign language has a wide variety of gestures, and manually detecting landmarks for each gesture might not be practical. This is where machine learning (ML) comes into play. ML can help us create a system that learns to recognize different signs on its own.
Free Resources