Motion generation is an innovative research field which consists of producing movements, gestures, or animations by computer systems. This process involves the use of mathematical algorithms to create dynamic, natural, and fluid motions that simulate human movements, enhancing human–computer interactions. Motion generation has many different applications, such as movement of virtual characters in video games or animated films, robotics, and sign language representation in communication systems for deaf people. In sign language communication systems, motion generation allows for an increase in the naturality of interactive avatars or virtual assistants that respond with a sign language output. Increasing naturality allows for the development of friendly communication systems between deaf and hearing people. These systems empower and enhance the communication capabilities of deaf individuals, encouraging inclusivity and facilitating their integration into various social and professional scenarios.
Most state-of-the-art sign language motion generation systems are based on expert rules or prerecorded movements. This work proposes to train a module to automatically generate sign language motion from sign phonemes, represented using HamNoSys [1]. The proposed generation system is based on deep learning using a transformer-based approach [2]. HamNoSys is a phonetic transcription system for sign language, which includes sign characteristics or phonemes such as hand location, shape, and movement.
To the best of the authors’ knowledge, the proposed system is the first motion generation system for sign language based on transformers. The main contributions of this paper are the following:
- Proposal and evaluation of a deep learning architecture based on transformers for generating the sequence of landmarks to represent a sign.
- This proposed approach also includes a stop module for deciding the end of the generation process. This stop module is also evaluated in different scenarios.
- Additional analyses for improving the system accuracy, considering different padding strategies, interpolation approaches, and data augmentation techniques.
Read the full paper here: https://www.mdpi.com/1424-8220/23/23/9365
References:
- Hanke, T. HamNoSys—Representing sign language data in language resources and language processing contexts. LREC 2004, 5, 1–6. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]