Gesture recognizer in deep learning
Deep learning has paved the way for numerous revolutionary techniques in the field of computer vision. With the possibility of making better models each day, it has become possible for us to apply such models in the domains of classification, recognition, and prediction. This is immensely useful in quite a lot of real-world applications that we'll see later on. Gesture recognition in images is one such example and precisely what we'll be targeting in this answer!
Gesture recognition
The process of recognizing what particular position our hand is in and what gesture it may indicate is known as gesture recognition. We can submit unlabelled images through gesture recognition applications, and a trained model can then predict what gesture the picture depicts.
MediaPipe and deep learning
MediaPipe is an open-source framework that provides various deep learning models that are trained to handle tasks like image classification, face and hand landmark detection, language detection, and more.
Gesture recognizer model
The model we will use for our application is a computer vision model from the framework MediaPipe. We can download the gesture_recognizer.task file from their official documentation. This task file serves as the trained model for our application, and we can simply use it to recognize patterns in new images.
Note: You can download the model called gesture_recognizer.task
and reference it in your code. here https://ai.google.dev/edge/mediapipe/solutions/vision/gesture_recognizer
Hand landmarks
A crucial concept in gesture recognition is first identifying the coordinates of the hand and if it even exists. Hand landmarks are specific points on the hand used for tracking hand gestures.
We can then extract the hand landmarks, such as fingertips and palm center, to analyze and interpret various hand gestures accurately.
Code walkthrough
import cv2import mediapipe as mpfrom mediapipe.framework.formats import landmark_pb2from mediapipe.tasks import pythonfrom mediapipe.tasks.python import vision
The first step is to import the necessary libraries for our code.
cv2is OpenCV's library that is mainly useful for image processing tasksmediapipeoffers the particular model of gesture recognition we require
img_file = "path/image.png"img_to_process = cv2.imread(img_file)
img_filerefers to the image path we will predict the gestures for. We read the image and store it in theimg_to_processvariable using OpenCV'simreadmethod.
hands = mp.solutions.hands.Hands(min_detection_confidence = 0.5, min_tracking_confidence = 0.5)rgb_format_img = cv2.cvtColor(img_to_process, cv2.COLOR_BGR2RGB)results = hands.process(rgb_format_img)
MediaPipe offers a solution that recognizes hands within an image and generates the respective landmarks i.e. coordinates of various points within the hand. We save the instance of this solution in
handsand specify a confidence level of at least 50% in the recognition. Since MediaPipe processes images in RGB format, we first make the necessary conversion using thecv2.cvtColormethod. Theresultsvariable stores the final landmarks when thehandssolution is applied on thergb_format_img.
hand_landmarks_list = []if results.multi_hand_landmarks:for hand_landmarks in results.multi_hand_landmarks:hand_landmarks_protocol = landmark_pb2.NormalizedLandmarkList()hand_landmarks_protocol.landmark.extend([landmark_pb2.NormalizedLandmark(x = landmark.x, y = landmark.y, z = landmark.z) for landmark in hand_landmarks.landmark])hand_landmarks_list.append(hand_landmarks_protocol)
The detected hand landmarks are represented as coordinates of various points within the hand. We extract and store these landmarks in
hand_landmarks_listas a list ofNormalizedLandmarkListobjects from thelandmark_pb2module.
mp_drawing_styles = mp.solutions.drawing_stylesmp_drawing = mp.solutions.drawing_utilsmp_hands = mp.solutions.hands
We define objects from the MediaPipe
mpmodule for drawing and styling the landmarks on the image.
if hand_landmarks_list:copied_image = img_to_process.copy()for landmark in hand_landmarks_list:mp_drawing.draw_landmarks(copied_image,landmark,mp_hands.HAND_CONNECTIONS,mp_drawing_styles.get_default_hand_landmarks_style(),mp_drawing_styles.get_default_hand_connections_style())base_options = python.BaseOptions(model_asset_path = 'gesture_recognizer.task')options = vision.GestureRecognizerOptions(base_options = base_options)recognizer = vision.GestureRecognizer.create_from_options(options)image = mp.Image.create_from_file(img_file)recognition_result = recognizer.recognize(image)top_gesture = recognition_result.gestures[0][0]gesture_prediction = f"{top_gesture.category_name} ({top_gesture.score:.2f})"cv2.putText(copied_image, gesture_prediction, (10, copied_image.shape[0] - 20) , cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)cv2.imshow("Guess the gesture!", copied_image)cv2.waitKey(0)cv2.destroyAllWindows()else:print("No hands were detected!")
If the
resultsvariable contains hand landmarks, we proceed with visualizing the landmarks on the input image.We create a copy of the input image,
copied_image, to draw the landmarks on it. Next, we use themp_drawing.draw_landmarksfunction to draw the hand landmarks using hand connections and landmark styles.We specify the path to our model and the required options using
python.BaseOptionsandvision.GestureRecognizerOptions. Our model is referenced through "gesture_recognizer.task".We initialize a gesture recognition model using the
vision.GestureRecognizerclass and recognize the gesture based on the hand landmarks.The recognized gesture is stored in
gesture_prediction. The recognized gesture is then displayed usingcv2.putText, showing the gesture category name and its corresponding score.Finally, we display the
copied_imagewith the recognized gesture usingcv2.imshow. The user can view the image with the recognized gesture, and it will remain open until a key is pressed.If no hands are detected in the image, we print "No hands were detected!".
Executable code
Yay, we've completed our code walkthrough and can now see the code in action. You can edit the code window below and click "Run" to see the results.
import cv2
import mediapipe as mp
from mediapipe.framework.formats import landmark_pb2
from mediapipe.tasks import python
from mediapipe.tasks.python import vision
img_file = "image2.png"
img_to_process = cv2.imread(img_file)
hands = mp.solutions.hands.Hands(min_detection_confidence = 0.5, min_tracking_confidence = 0.5)
rgb_format_img = cv2.cvtColor(img_to_process, cv2.COLOR_BGR2RGB)
results = hands.process(rgb_format_img)
hand_landmarks_list = []
if results.multi_hand_landmarks:
for hand_landmarks in results.multi_hand_landmarks:
hand_landmarks_protocol = landmark_pb2.NormalizedLandmarkList()
hand_landmarks_protocol.landmark.extend([
landmark_pb2.NormalizedLandmark(x = landmark.x, y = landmark.y, z = landmark.z) for landmark in hand_landmarks.landmark
])
hand_landmarks_list.append(hand_landmarks_protocol)
mp_drawing_styles = mp.solutions.drawing_styles
mp_drawing = mp.solutions.drawing_utils
mp_hands = mp.solutions.hands
if hand_landmarks_list:
copied_image = img_to_process.copy()
for landmark in hand_landmarks_list:
mp_drawing.draw_landmarks(
copied_image,
landmark,
mp_hands.HAND_CONNECTIONS,
mp_drawing_styles.get_default_hand_landmarks_style(),
mp_drawing_styles.get_default_hand_connections_style()
)
base_options = python.BaseOptions(model_asset_path = 'gesture_recognizer.task')
options = vision.GestureRecognizerOptions(base_options = base_options)
recognizer = vision.GestureRecognizer.create_from_options(options)
image = mp.Image.create_from_file(img_file)
recognition_result = recognizer.recognize(image)
top_gesture = recognition_result.gestures[0][0]
gesture_prediction = f"{top_gesture.category_name} ({top_gesture.score:.2f})"
cv2.putText(copied_image, gesture_prediction, (10, copied_image.shape[0] - 20) , cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)
cv2.imshow("Guess the gesture!", copied_image)
cv2.waitKey(0)
cv2.destroyAllWindows()
else:
print("No hands were detected!")Gesture recognition demonstration
Real-life applications
A wonderful aspect of such technologies is that they are crucial to many revolutionary domains in real life. Let's see how gesture recognition is important around us!
Use cases | Explanation |
Human-computer interaction | Enables users to interact with computers, mobiles, or devices using hand gestures |
Used for gesture-based navigation and performing tasks | |
Gaming | Enhances gaming experiences by allowing players to control characters and actions |
Popular in motion-controlled games | |
Virtual reality | Enables users to interact with virtual environments using hand gestures |
Provides a natural and intuitive way to pick up objects, manipulate virtual elements, and navigate | |
Sign language interpretation | Converts sign language gestures into text or speech, aiding communication |
Augmented reality | Allows users to interact with digital content overlaid on the real world using hand gestures |
Assistive technology | Helps individuals with physical disabilities to control devices |
Note: Here's the complete list of related projects in MediaPipe or deep learning.
What does the score in the model represent?
Free Resources