TY - CONF
T1 - SelfME: Self-Supervised Motion Learning for Micro-Expression Recognition
AU - Fan, Xinqi
AU - Chen, Xueli
AU - Jiang, Mingjie
AU - Shahid, Ali Raza
AU - Yan, Hong
PY - 2023
Y1 - 2023
N2 - Facial micro-expression (ME) refers to a brief spontaneous facial movement that can convey a person's genuine emotion. It has numerous applications, including lie detection , and criminal analysis. Although deep learning-based ME recognition (MER) methods achieved considerable success, these methods still required sophisticated pre-processing using conventional optical flow-based methods to extract facial motions as inputs. To overcome this limitation, we proposed a novel MER framework using self-supervised learning to extract facial motion for ME (SelfME). To the best of our knowledge, this is the first work with an automatically self-learned motion technique for MER. However, the self-supervised motion learning method may suffer from ignoring symmetrical facial actions on the left and right sides of the face when extracting fine features. To tackle this problem, we developed a symmetric contrastive vision transformer (SCViT) to constrain the learning of similar facial action features for the left and right parts of the faces. Experiments were conducted on two benchmark datasets, showing that our method achieved state-of-the-art performance. In addition, ablation studies demonstrated the effectiveness of our method.
AB - Facial micro-expression (ME) refers to a brief spontaneous facial movement that can convey a person's genuine emotion. It has numerous applications, including lie detection , and criminal analysis. Although deep learning-based ME recognition (MER) methods achieved considerable success, these methods still required sophisticated pre-processing using conventional optical flow-based methods to extract facial motions as inputs. To overcome this limitation, we proposed a novel MER framework using self-supervised learning to extract facial motion for ME (SelfME). To the best of our knowledge, this is the first work with an automatically self-learned motion technique for MER. However, the self-supervised motion learning method may suffer from ignoring symmetrical facial actions on the left and right sides of the face when extracting fine features. To tackle this problem, we developed a symmetric contrastive vision transformer (SCViT) to constrain the learning of similar facial action features for the left and right parts of the faces. Experiments were conducted on two benchmark datasets, showing that our method achieved state-of-the-art performance. In addition, ablation studies demonstrated the effectiveness of our method.
UR - https://www.mendeley.com/catalogue/6d4e63ab-bcf3-30fc-a7b3-d1a37ae4d109/
UR - https://www.mendeley.com/catalogue/6d4e63ab-bcf3-30fc-a7b3-d1a37ae4d109/
U2 - 10.1109/cvpr52729.2023.01329
DO - 10.1109/cvpr52729.2023.01329
M3 - Paper
ER -