MITB Banner

Google Open-Sources New Real-Time Hand Gesture-Tracking ML Pipeline

Share

Robust real-time hand perception is one of the most challenging and complex computer vision projects right now. As human hands are flexible as well as highly articulated, they lack high contrast patterns. Therefore, one of the important reasons to develop hand gesture recognition is to establish a robust interaction between humans and machines. 

This week, Google announced the release of a new approach to hand perception — an ML Pipeline for hand-tracking and gesture recognition. Earlier in June this year, Google previewed this new technique in the Computer Vision and Pattern Recognition conference 2019. 

Behind the Architecture

The hand perception functionality is implemented in MediaPipe which is an open-source cross-platform framework for building pipelines in order to process perceptual data of different modalities, like video and audio. It provides high-fidelity hand and finger tracking by employing ML to infer 21 3D keypoints of a hand from just one single frame.

The hand tracking solution utilises a machine learning pipeline which is constituted of three models as mentioned below

  • BlazePalm: BlazePalm is a single-shot detector model that operates on the full image and returns an oriented hand bounding box. To achieve this, the model follows some specific strategies such as training a palm detector instead of a hand detector, using an encoder-decoder feature extractor for bigger scene context-awareness even for small objects and lastly, minimising the focal loss during training. BlazePalm is mainly used to detect the occluded and self-occluded hands. So far, this method has achieved an average precision of 95.7% in palm detection.
  • Hand Landmark Model: This model operates on the cropped image region defined by the palm detector and performs precise keypoint localisation of 21 3D hand-knuckle coordinates inside the detected hand regions via regression. In order to get robust hand poses, the researchers manually annotated approximately 30,000 real-world images with 21 3D co-ordinates. A high-quality synthetic hand model is also rendered over various backgrounds and map it to the corresponding 3D coordinates.
  • Gesture Recogniser: This model classifies the previously computed keypoint configuration into a discrete set of gestures. The researchers used accumulated angles of joints to determine the state of the fingers and it is then mapped to a set of pre-defined gestures. This technique helped the researchers to estimate basic static gestures with reasonable quality.  

As mentioned earlier, this approach is implemented in the MediaPipe framework. The reason behind the implementation is that with the help of this cross-platform, the hand perception pipeline can be built as a directed graph of modular components, also known as Calculators. One important optimisation to the approach provided by this cross-platform is that the palm detector only runs when it is necessary thus saving a significant amount of computational time.  

How It Is Different

This approach is different from the existing state-of-the-art approaches. The existing approaches rely primarily on powerful desktop environments for inference, while the new approach not only achieves real-time performance on a mobile phone but also has the capability to scale more than one hand.

Application of This New Approach

The hand perception approach can be used in various cases and few of them are mentioned below

  • It can be enabled to the overlay of digital content and information on top of the physical world in augmented reality.
  • Will help the differently-abled with sign language
  • It can be applied in VR systems to manipulate virtual objects.
  • It can also be utilised to control intelligent robots.
Share
Picture of Ambika Choudhury

Ambika Choudhury

A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.