TY - GEN
T1 - American Sign Language Alphabet Recognition with YOLOv5 Enhanced by MediaPipe Hands
AU - Wang, Zijian
AU - Liu, Ziwei
AU - Li, Zongxi
AU - Jiang, Yufeng
AU - Li, Fengheng
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Sign language recognition (SLR) aims to break the communication barrier between hearing-impaired individuals and others, and is beneficial in building an inclusive and caring society. This paper aims to achieve real-Time static sign language interpretation from images employing deep learning methods. Conventionally, a pipeline approach is used to locate the position of the hand gesture in an image and then classify the gesture. You Only Look Once version 5 (YOLOv5) is a suitable model that captures the hand gesture with bounding boxes and predicts the gesture with a convolution-based classifier. However, in practice, SLR performance may be significantly affected when gestures are partially occluded, or the background environment is complicated. Therefore, this paper proposes a method using MediaPipe Hands to enhance hand features, enabling YOLOv5 to locate the hand position more precisely. In addition, MediaPipe Hands can predict finger position even when the hand is partly obscured, providing essential features in the classification stage. In experiments, MediaPipe Hands improves the success rate of hand localization in complex environments and increase hand gesture classification accuracy. Compared to the baseline model, the model employing features enhanced by MediaPipe Hands outperformed those without MediaPipe Hands in static sign language recognition (SSLR1). Moreover, our method was tested in real-life scenarios by implementing a web service application and demonstrated improved real-Time recognition performance.
AB - Sign language recognition (SLR) aims to break the communication barrier between hearing-impaired individuals and others, and is beneficial in building an inclusive and caring society. This paper aims to achieve real-Time static sign language interpretation from images employing deep learning methods. Conventionally, a pipeline approach is used to locate the position of the hand gesture in an image and then classify the gesture. You Only Look Once version 5 (YOLOv5) is a suitable model that captures the hand gesture with bounding boxes and predicts the gesture with a convolution-based classifier. However, in practice, SLR performance may be significantly affected when gestures are partially occluded, or the background environment is complicated. Therefore, this paper proposes a method using MediaPipe Hands to enhance hand features, enabling YOLOv5 to locate the hand position more precisely. In addition, MediaPipe Hands can predict finger position even when the hand is partly obscured, providing essential features in the classification stage. In experiments, MediaPipe Hands improves the success rate of hand localization in complex environments and increase hand gesture classification accuracy. Compared to the baseline model, the model employing features enhanced by MediaPipe Hands outperformed those without MediaPipe Hands in static sign language recognition (SSLR1). Moreover, our method was tested in real-life scenarios by implementing a web service application and demonstrated improved real-Time recognition performance.
KW - Image Classification
KW - MediaPipe
KW - Sign Language
KW - Static Sign Language Recognition
KW - YOLOv5
UR - http://www.scopus.com/inward/record.url?scp=85175083373&partnerID=8YFLogxK
U2 - 10.1109/ICA58538.2023.10273099
DO - 10.1109/ICA58538.2023.10273099
M3 - Conference contribution
AN - SCOPUS:85175083373
T3 - Proceedings of the 2023 International Conference on Instrumentation, Control, and Automation, ICA 2023
SP - 103
EP - 108
BT - Proceedings of the 2023 International Conference on Instrumentation, Control, and Automation, ICA 2023
T2 - 8th International Conference on Instrumentation, Control, and Automation, ICA 2023
Y2 - 9 August 2023 through 11 August 2023
ER -