Hierarchical Coding for Talking-Head Video

Yu Liu, Shibo Li, Shuyuan Zhu, Siu Kei Au Yeung, Xing Wen, Bing Zeng

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Talking-head video is very popular in video conference and social media, where the camera captures the movement of user's head and the change of facial expression. In this paper, we propose a hierarchical coding scheme for the compression of talking-head video. In our proposed method, three data layers, including one base layer, one enhancement layer and one feature layer, are formed as the input of encoder. More specifically, the base layer is generated by spatially sub-sampling the source video. The enhancement layer is composed by the specific key frames and the feature layer is produced based on the extracted facial landmarks. These layers are separately compressed but fused together to reconstruct the video signal in the decoder side. To achieve a high-quality reconstruction, we design the multi-feature fusion network in which the feature layer is used to guide the fusion of base layer and enhancement layer. The experiment results demonstrate the good performance of our proposed method for the coding of talking-head video.

Original languageEnglish
Title of host publicationIEEE International Symposium on Circuits and Systems, ISCAS 2022
Pages3043-3047
Number of pages5
ISBN (Electronic)9781665484855
DOIs
Publication statusPublished - 2022
Event2022 IEEE International Symposium on Circuits and Systems, ISCAS 2022 - Austin, United States
Duration: 27 May 20221 Jun 2022

Publication series

NameProceedings - IEEE International Symposium on Circuits and Systems
Volume2022-May
ISSN (Print)0271-4310

Conference

Conference2022 IEEE International Symposium on Circuits and Systems, ISCAS 2022
Country/TerritoryUnited States
CityAustin
Period27/05/221/06/22

Keywords

  • HEVC
  • Talking-head video
  • coding
  • facial landmarks
  • fusion

Fingerprint

Dive into the research topics of 'Hierarchical Coding for Talking-Head Video'. Together they form a unique fingerprint.

Cite this