TY - GEN
T1 - Enhancing Continuous Sign Language Recognition with Self-Attention and MediaPipe Holistic
AU - Jiang, Yufeng
AU - Li, Fengheng
AU - Li, Zongxi
AU - Liu, Ziwei
AU - Wang, Zijian
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Sign language recognition (SLR) is an interdisciplinary application that combines computer vision and natural language processing. This paper focuses on Continuous sign language recognition (CSLR)1, which refers to recognizing a continuous sequence of sign language sentences, phrases, or words expressed in a short video. Currently, most phrase-level CSLR research primarily uses recurrent neural networks (RNNs). However, RNNs struggle to capture global dependencies and can only model sequential actions. Sign language involves complex spatial patterns formed by hand gestures, facial expressions, and body movements. Therefore, the global dependency of the spatial features extracted from the video is crucial for this task. In this paper, we introduce a novel pipeline framework for addressing CSLR. We first employ MediaPipe Holistic to extract key points from sign language videos, which are converted into sequential input data that the model can process. To overcome the disadvantages of RNNs, we use the Self-Attention mechanism, which excels at identifying relationships among key points and capturing global dependencies between sign language actions within a sequence. Combining the Self-Attention model with the extracted key points creates a more effective and efficient solution for CSLR. Additionally, we discuss the selection of MediaPipe Holistic key points, as not all key points equally contribute to the recognition. Experimental results show that the proposed pipeline exhibits promising performance on the first ten glosses (classes) of the Word-Level American Sign Language (WLASL-10) dataset.
AB - Sign language recognition (SLR) is an interdisciplinary application that combines computer vision and natural language processing. This paper focuses on Continuous sign language recognition (CSLR)1, which refers to recognizing a continuous sequence of sign language sentences, phrases, or words expressed in a short video. Currently, most phrase-level CSLR research primarily uses recurrent neural networks (RNNs). However, RNNs struggle to capture global dependencies and can only model sequential actions. Sign language involves complex spatial patterns formed by hand gestures, facial expressions, and body movements. Therefore, the global dependency of the spatial features extracted from the video is crucial for this task. In this paper, we introduce a novel pipeline framework for addressing CSLR. We first employ MediaPipe Holistic to extract key points from sign language videos, which are converted into sequential input data that the model can process. To overcome the disadvantages of RNNs, we use the Self-Attention mechanism, which excels at identifying relationships among key points and capturing global dependencies between sign language actions within a sequence. Combining the Self-Attention model with the extracted key points creates a more effective and efficient solution for CSLR. Additionally, we discuss the selection of MediaPipe Holistic key points, as not all key points equally contribute to the recognition. Experimental results show that the proposed pipeline exhibits promising performance on the first ten glosses (classes) of the Word-Level American Sign Language (WLASL-10) dataset.
KW - Continuous Sign Language Recognition
KW - MediaPipe
KW - Self-Attention
KW - Sign Language Recognition
KW - Video Classification
UR - https://www.scopus.com/pages/publications/85175032405
U2 - 10.1109/ICA58538.2023.10273118
DO - 10.1109/ICA58538.2023.10273118
M3 - Conference contribution
AN - SCOPUS:85175032405
T3 - Proceedings of the 2023 International Conference on Instrumentation, Control, and Automation, ICA 2023
SP - 97
EP - 102
BT - Proceedings of the 2023 International Conference on Instrumentation, Control, and Automation, ICA 2023
T2 - 8th International Conference on Instrumentation, Control, and Automation, ICA 2023
Y2 - 9 August 2023 through 11 August 2023
ER -