Finger counter using MediaPipe

MediaPipe is a framework developed by Google to facilitate developers with pre-built and customizable AI solutions. It enables the developers to create applications that require real-time observation of video or audio data by providing open-source pre-trained models.

Expected performance

The finger counter model processes the pre-recorded video and a live hand movement via web camera and displays the total fingers extended in the current moment. The model is trained on pre-defined landmarks that are used to check if any of the fingers are extended to add to the count.

The image below shows the six possible use cases, and the model has given correct observations on each of them.

cv2: The OpenCV library that is used for computer vision-related tasks.
time: Used to measure the time intervals and delays.
os: Used to interact with the operating system and access file operations.
mediapipe: Used to build computer vision and AI-related applications.
HandTrackingModule: The custom model contains gesture tracking functions and implementations.

Example code

In this code, we implement a model that can detect if a finger is extended and automatically show the count of total extended fingers in the current moment.

import cv2
import mediapipe as mp
import time


class handDetector():
    def __init__(self, mode=False, maxHands=1, detectionCon=0.5, trackCon=0.5):
        self.mode = mode
        self.maxHands = maxHands
        self.detectionCon = detectionCon
        self.trackCon = trackCon

        self.mpHands = mp.solutions.hands
        self.hands = self.mpHands.Hands(self.mode, self.maxHands,
                                        self.detectionCon, self.trackCon)
        self.mpDraw = mp.solutions.drawing_utils

    def findHands(self, img, draw=True):
        imgRGB = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        self.results = self.hands.process(imgRGB)
        # print(results.multi_hand_landmarks)

        if self.results.multi_hand_landmarks:
            for handLms in self.results.multi_hand_landmarks:
                if draw:
                    self.mpDraw.draw_landmarks(img, handLms,
                                               self.mpHands.HAND_CONNECTIONS)
        return img

    def findPosition(self, img, handNo=0, draw=True):

        lmList = []
        if self.results.multi_hand_landmarks:
            myHand = self.results.multi_hand_landmarks[handNo]
            for id, lm in enumerate(myHand.landmark):
                # print(id, lm)
                h, w, c = img.shape
                cx, cy = int(lm.x * w), int(lm.y * h)
                # print(id, cx, cy)
                lmList.append([id, cx, cy])
                if draw:
                    cv2.circle(img, (cx, cy), 15, (255, 0, 255), cv2.FILLED)

        return lmList


def main():
    pTime = 0
    cTime = 0
    cap = cv2.VideoCapture('countFingers.mp4')
    detector = handDetector()
    while True:
        success, img = cap.read()
        img = detector.findHands(img)
        lmList = detector.findPosition(img)
        if len(lmList) != 0:
            print(lmList[4])

        cTime = time.time()
        fps = 1 / (cTime - pTime)
        pTime = cTime

        cv2.putText(img, str(int(fps)), (10, 70), cv2.FONT_HERSHEY_PLAIN, 3,
                    (255, 0, 255), 3)

        cv2.imshow("Image", img)
        cv2.waitKey(1)


if __name__ == "__main__":
    main()

Code for finger counter.

Code explanation

`FingerCounter.py`

It is the main file that contains the customized specification of the application's appearance, the source of data being tested, and checks.

Lines 1–4: Import all the necessary libraries and modules.
Line 6: Set the width and height of the webcam.
Lines 8–10: Use VideoCapture() to open the camera and specify the video resolutions.
- Pass 0 as a parameter to open a webcam.
- Pass the filename or file link to open a pre-recorded video inside "".
Line 12: Initialize a pTime attribute that tracks the time of the previous frame.
Line 14: Create a handDetector class instance from the imported HandTrackingModule.

Lines 16–17: Define a list containing landmark IDs corresponding to the fingertips and a sum attribute initialized to zero.
Lines 19,59–60: Create a while loop for continuous detection, which terminates when enter key is pressed, i.e., 13 in ASCII.
Lines 20–22: Use read() to capture the image and then call findHands method to detect hand and track landmarks and call the findPosition method on it to get a list of landmarks.
Lines 24–25: If landmarks are detected in the selected frame, then create a fingers_list list.
Lines 28–38: If the thumb is horizontally extended and the fingers are vertically extended, append one or else append zero to the finger_list.
Lines 40–43: Save the count in totalFingers and add it to the sum to track the total detections per frame.
Lines 46–48: Create an output box inside the frame that shows the current finger count and keeps changing dynamically.
Lines 50–56: Calculate the rate of frames per second displayed inside the frame and keeps changing dynamically.
Line 58: Finally, display the image frame using imshow() method.

`HandTrackingModule.py`

It is the custom module file containing a handDetector class that contains all the necessary functions used to detect landmarks and count fingers.

Lines 7–16: Initialize the properties of the hand tracking and Mediapipe objects.

Lines 18–28: A findHands method that converts the image to RGB format and draws the landmarks if the hand is detected using draw_landmarks.
Lines 30–44: A findPosition method that iterates through a list of landmarks, calculates their dimensions, and stores the positions in the lmList.
Lines 47–67: A main method that sets all the variables, creates a handDetecter instance, and calls the methods to make observations and display count and FPSFrames per second.

Code output

Once the code showed the expected response on the live dataset through the webcam, it was tested on a pre-recorded video. The code correctly identified the finger count, as seen in the results below.

`mode`	Set the static image mode. It is false in this example.
`maxHands`	The number of hands that are being detected. It is 1 in this example.
`detectionCon`	Set the detection confidence threshold to minimize the false positive cases.
`trackCon`	Set the tracking confidence threshold to minimize the false positive cases.
`mpHands`	Hold the hands module to `mediapipe`
`hands`	Create a `Hands` object to track the hand.
`mpDraw`	Give access to the drawing utility functions.