Gesture Recognition Project

Problem Statement :

Build a deep learning model to detect five gestures from videos captured through a smart tv's web camera. These gestures are used to control TV functionality
The five gestures and their corresponding TV controls are :

Dataset :

The data set contains train and val folders contain 663 , 100 video frames respectively.
Two CSV files contain the frames and the corresponding class labels.
Each video is made of 30 frames. Frames come in two sizes — 120x160 and 320x320 as they are recorded from two different sources.

Objectives :

Develop a deep learning model that is able to classify the gesture based on the video frames.
The deep learning model should have High accuracy Low memory footprint ( to fit in a webcam memory (typically < 50MB)

Solution Approach :

Two model architectures have been experimented with

Final Accuracy of 88% has been achieved using Retrained MobileNet 2D CNN + RNN.

Conclusion