Pose detection in deep learning
Deep learning is a subfield of artificial intelligence focused on training neural networks to perform complex tasks, while computer vision deals with extracting information from visual data. We can use such fields in numerous detection tasks, such as pose detection. In this answer, we'll be implementing this concept.
Pose detection
Pose detection, also known as human pose estimation, involves identifying the key points in a human body, like hands, legs, joints, and other body parts, from an image or video.
Gesture recognition as an example
One further specific application of pose detection is gesture recognition, where we can specifically analyze hand poses to identify specific gestures. For instance, recognizing a thumbs-up gesture can be used in interactions with devices or even controlling virtual elements.
Note: Learn how to implement gesture recognition here.
MediaPipe and deep learning
MediaPipe is an open-source framework that offers a collection of pre-trained deep-learning models, including pose detection. The main advantage is that such a model can be easily integrated into our custom computer vision applications.
Pose landmark model
To demonstrate pose detection, we will use MediaPipe's pose_landmarker.task model. This pre-trained model allows us to recognize the various landmarks in one's pose.
Note: You can download this model here.
Pose landmarks
Pose landmarks are key points on the human body that define its pose and positioning. These landmarks represent specific body parts, like hands, legs, and joints, and are essential for accurately interpreting and recognizing human actions and gestures.
Pose detection in images
To keep things simple, we'll start off by understanding how to apply this model to single images.
Code walkthrough
We will use Python and OpenCV for image processing and visualization, along with MediaPipe for pose detection.
import cv2import mediapipe as mp
First and foremost, we import the necessary libraries for our code.
img_file = "pose1.png"img = cv2.imread(img_file)
Next, we define the image file we want to process using
img_fileand read it using thecv2.imreadmethod, saving it asimg.
mp_pose = mp.solutions.pose.Pose(min_detection_confidence=0.5,min_tracking_confidence=0.5)
This is a crucial step, where we define an instance of the pose detection solution
mp_posewith specified confidence levels for detection and tracking. This means that in order to consider the pose landmark, the model should at least have this level of certainty.
rgb_img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)results = mp_pose.process(rgb_img)
To process the image with the pose detection model, we convert it to the RGB format using
cv2.cvtColor. The results of the detection are stored inresults.
copied_image = img.copy()if results.pose_landmarks:mp.solutions.drawing_utils.draw_landmarks(copied_image,results.pose_landmarks,mp.solutions.pose.POSE_CONNECTIONS,mp.solutions.drawing_styles.get_default_pose_landmarks_style())
We then create a copy of the original image,
copied_image, to draw the pose landmarks on it. If pose landmarks are detected in the image i.e. theifstatement, we visualize the landmarks on the copied image using themp.solutions.drawing_utils.draw_landmarksfunction. On the other hand, themp.solutions.mp_drawing_stylesfunction is used for styling the landmarks.
cv2.imshow("Detecting poses", copied_image)cv2.waitKey(0)cv2.destroyAllWindows()
After visualizing the pose landmarks on the image, we display the image using
cv2.imshow. The displayed image will show the pose landmarks and will remain open until a key is pressed.If no pose landmarks are detected in the image, we print the message "No hands were detected!"
Complete code
import cv2
import mediapipe as mp
img_file = "pose1.png"
img = cv2.imread(img_file)
mp_pose = mp.solutions.pose.Pose(
min_detection_confidence=0.5,
min_tracking_confidence=0.5)
rgb_img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
results = mp_pose.process(rgb_img)
copied_image = img.copy()
if results.pose_landmarks:
mp.solutions.drawing_utils.draw_landmarks(
copied_image,
results.pose_landmarks,
mp.solutions.pose.POSE_CONNECTIONS,
mp.solutions.drawing_styles.get_default_pose_landmarks_style()
)
cv2.imshow("Detecting poses", copied_image)
cv2.waitKey(0)
cv2.destroyAllWindows()
Demonstrating pose detection in images
Pose detection in videos
We can also detect the changing poses in videos! This can be achieved by considering each frame of the video as a single image and the landmarks being applied to each image. When this process is carried out continuously, we get a video with pose landmarks changing in every frame. Let's see the code in action.
import cv2
import mediapipe as mp
mp_pose = mp.solutions.pose.Pose(
min_detection_confidence=0.5,
min_tracking_confidence=0.5)
cap = cv2.VideoCapture("https://player.vimeo.com/external/206207511.sd.mp4?s=797bab17ff9fce2a8973fd5c6c161d8d80f76f7b&profile_id=164&oauth2_token_id=57447761")
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
results = mp_pose.process(rgb_frame)
if results.pose_landmarks:
mp.solutions.drawing_utils.draw_landmarks(
frame,
results.pose_landmarks,
mp.solutions.pose.POSE_CONNECTIONS,
mp.solutions.drawing_styles.get_default_pose_landmarks_style()
)
cv2.imshow("Detecting poses in videos", frame)
if cv2.waitKey(1) == 27:
break
cap.release()
cv2.destroyAllWindows()
In the code above, we go through each frame of the video by defining a while loop that runs until the video is streaming. We read each frame and then apply the same concept discussed for pose detection in images. The streaming ends once the video is over or the "Esc" key i.e. cv2.waitKey(1) == 27 is pressed.
Demonstrating pose detection in videos
Now, let's take a look at how poses are continuously detected in each video frame below.
Note: You can change the link of the video to any video of your choice or even use local video clips.
Pose detection using the webcam
Passing 0 to the cv2.VideoCapture function means redirecting the video stream to the web camera of your local machine. In this way, you can use the code to detect the shifts in your poses in real time!
import cv2import mediapipe as mpmp_pose = mp.solutions.pose.Pose(min_detection_confidence=0.5,min_tracking_confidence=0.5)cap = cv2.VideoCapture(0)while cap.isOpened():ret, frame = cap.read()if not ret:breakrgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)results = mp_pose.process(rgb_frame)if results.pose_landmarks:mp.solutions.drawing_utils.draw_landmarks(frame,results.pose_landmarks,mp.solutions.pose.POSE_CONNECTIONS,mp.solutions.drawing_styles.get_default_pose_landmarks_style())cv2.imshow("Detecting poses using the webcam", frame)if cv2.waitKey(1) == 27:breakcap.release()cv2.destroyAllWindows()
Note: You can run the above code on your local machine in order to connect to your web camera (if any).
Applications of pose detection
Let's explore a few interesting applications where we can see the pose detection technology actually being made useful.
Applications | Explanation |
Human-computer interaction | Enables natural interaction with computers using gestures. |
Sports analysis | Analyzes athletes' movements to improve performance. |
Augmented reality | Integrates virtual content based on user body poses. |
Fitness and health | Monitors and analyzes body postures during exercise. |
Action recognition | Recognizes human actions from videos. |
Note: Here's the complete list of related projects in MediaPipe or deep learning.
Which MediaPipe function do we use to draw the pose detection landmarks on the image?
Free Resources