Finger counter using MediaPipe

MediaPipe is a framework developed by Google to facilitate developers with pre-built and customizable AI solutions. It enables the developers to create applications that require real-time observation of video or audio data by providing open-source pre-trained models.

Expected performance

The finger counter model processes the pre-recorded video and a live hand movement via web camera and displays the total fingers extended in the current moment. The model is trained on pre-defined landmarks that are used to check if any of the fingers are extended to add to the count.

The image below shows the six possible use cases, and the model has given correct observations on each of them.

Detecting finger count.
Detecting finger count.

What are landmarks?

Landmarks are the places defined on the hand, like knuckles, palms, and tips. These defined places are used to track the hand gestures and submit observations based on the training and pre-existing data. MediaPipe, by default, contains standard identification values for specific landmarks; for example, the index fingertip is 8.

If we consider the fingertips and the wrist, the following are the default MediaPipe landmarks.

Some of the default MediaPipe landmarks.
Some of the default MediaPipe landmarks.

Let's code this understanding and see if the expected output can be achieved easily.

Required imports

To implement this finger counter in code, we first need to import the following libraries and modules.

import cv2
import time
import os
import mediapipe as mp
import HandTrackingModule as htm
  • cv2: The OpenCV library that is used for computer vision-related tasks.

  • time: Used to measure the time intervals and delays.

  • os: Used to interact with the operating system and access file operations.

  • mediapipe: Used to build computer vision and AI-related applications.

  • HandTrackingModule: The custom model contains gesture tracking functions and implementations.

Example code

In this code, we implement a model that can detect if a finger is extended and automatically show the count of total extended fingers in the current moment.

import cv2
import mediapipe as mp
import time


class handDetector():
    def __init__(self, mode=False, maxHands=1, detectionCon=0.5, trackCon=0.5):
        self.mode = mode
        self.maxHands = maxHands
        self.detectionCon = detectionCon
        self.trackCon = trackCon

        self.mpHands = mp.solutions.hands
        self.hands = self.mpHands.Hands(self.mode, self.maxHands,
                                        self.detectionCon, self.trackCon)
        self.mpDraw = mp.solutions.drawing_utils

    def findHands(self, img, draw=True):
        imgRGB = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        self.results = self.hands.process(imgRGB)
        # print(results.multi_hand_landmarks)

        if self.results.multi_hand_landmarks:
            for handLms in self.results.multi_hand_landmarks:
                if draw:
                    self.mpDraw.draw_landmarks(img, handLms,
                                               self.mpHands.HAND_CONNECTIONS)
        return img

    def findPosition(self, img, handNo=0, draw=True):

        lmList = []
        if self.results.multi_hand_landmarks:
            myHand = self.results.multi_hand_landmarks[handNo]
            for id, lm in enumerate(myHand.landmark):
                # print(id, lm)
                h, w, c = img.shape
                cx, cy = int(lm.x * w), int(lm.y * h)
                # print(id, cx, cy)
                lmList.append([id, cx, cy])
                if draw:
                    cv2.circle(img, (cx, cy), 15, (255, 0, 255), cv2.FILLED)

        return lmList


def main():
    pTime = 0
    cTime = 0
    cap = cv2.VideoCapture('countFingers.mp4')
    detector = handDetector()
    while True:
        success, img = cap.read()
        img = detector.findHands(img)
        lmList = detector.findPosition(img)
        if len(lmList) != 0:
            print(lmList[4])

        cTime = time.time()
        fps = 1 / (cTime - pTime)
        pTime = cTime

        cv2.putText(img, str(int(fps)), (10, 70), cv2.FONT_HERSHEY_PLAIN, 3,
                    (255, 0, 255), 3)

        cv2.imshow("Image", img)
        cv2.waitKey(1)


if __name__ == "__main__":
    main()
Code for finger counter.

Code explanation

FingerCounter.py

It is the main file that contains the customized specification of the application's appearance, the source of data being tested, and checks.

  • Lines 1–4: Import all the necessary libraries and modules.

  • Line 6: Set the width and height of the webcam.

  • Lines 8–10: Use VideoCapture() to open the camera and specify the video resolutions.

    • Pass 0 as a parameter to open a webcam.

    • Pass the filename or file link to open a pre-recorded video inside "".

  • Line 12: Initialize a pTime attribute that tracks the time of the previous frame.

  • Line 14: Create a handDetector class instance from the imported HandTrackingModule.

  • Lines 16–17: Define a list containing landmark IDs corresponding to the fingertips and a sum attribute initialized to zero.

  • Lines 19,59–60: Create a while loop for continuous detection, which terminates when enter key is pressed, i.e., 13 in ASCII.

  • Lines 20–22: Use read() to capture the image and then call findHands method to detect hand and track landmarks and call the findPosition method on it to get a list of landmarks.

  • Lines 24–25: If landmarks are detected in the selected frame, then create a fingers_list list.

  • Lines 28–38: If the thumb is horizontally extended and the fingers are vertically extended, append one or else append zero to the finger_list.

  • Lines 40–43: Save the count in totalFingers and add it to the sum to track the total detections per frame.

  • Lines 46–48: Create an output box inside the frame that shows the current finger count and keeps changing dynamically.

  • Lines 50–56: Calculate the rate of frames per second displayed inside the frame and keeps changing dynamically.

  • Line 58: Finally, display the image frame using imshow() method.

HandTrackingModule.py

It is the custom module file containing a handDetector class that contains all the necessary functions used to detect landmarks and count fingers.

  • Lines 7–16: Initialize the properties of the hand tracking and Mediapipe objects.

mode

Set the static image mode. It is false in this example.

maxHands

The number of hands that are being detected. It is 1 in this example.

detectionCon

Set the detection confidence threshold to minimize the false positive cases.

trackCon

Set the tracking confidence threshold to minimize the false positive cases.

mpHands

Hold the hands module to mediapipe

hands

Create a Hands object to track the hand.

mpDraw

Give access to the drawing utility functions.

  • Lines 18–28: A findHands method that converts the image to RGB format and draws the landmarks if the hand is detected using draw_landmarks.

  • Lines 30–44: A findPosition method that iterates through a list of landmarks, calculates their dimensions, and stores the positions in the lmList.

  • Lines 47–67: A main method that sets all the variables, creates a handDetecter instance, and calls the methods to make observations and display count and FPSFrames per second.

Code output

Once the code showed the expected response on the live dataset through the webcam, it was tested on a pre-recorded video. The code correctly identified the finger count, as seen in the results below.

Note: Learn more about gesture detection in deep learning.

Test your understanding

Q

What should we do if we want the model to detect the count of two hands?

A)

It is not possible.

B)

change maxHands = 1 to maxHands = 2

C)

Add another list to store the tipIds.

Copyright ©2024 Educative, Inc. All rights reserved