Hand Pose Recognition presents significant challenges that need to be addressed, such as varying lighting conditions or complex backgrounds, which can hinder accurate and robust hand pose estimation. This can be mitigated by employing MediaPipe to facilitate the efficient extraction of representative landmarks from static images combined with the use of Convolutional Neural Networks.
Extracting these landmarks from the hands mitigates the impact of lighting variability or the presence of complex backgrounds. However, the variability of the location and size of the hands is still not addressed by this process. Therefore, the use of processing modules to normalize these points independent of the location of the wrist and the zoom of the hands can significantly mitigate the effects of these variabilities. In all the experiments performed in this work based on American Sign Language alphabet datasets of 870, 27,000, and 87,000 images, the application of the proposed normalizations results in significant improvements in the model performance in a limited resource scenario. Particularly, under conditions of high variability, applying both normalizations resulted in a performance increment of 45.08%, increasing the accuracy from 43.94 ± 0.64% to 89.02 ± 0.40%.
Read the full paper here: