TY - JOUR
T1 - Recursive likelihood evaluation and fast search algorithm for polynomial segment model with application to speech recognition
AU - Li, Chak Fai
AU - Siu, Man Hung
AU - Au-Yeung, Jeff Siu Kei
N1 - Funding Information:
Because speech is produced by a continuous human articulatory process, there are strong temporal correlations between speech observations. The conditional independence assumption used in HMMs does not adequately capture the temporal correlations. This can be illustrated by the case of a two-mixture Manuscript received January 28, 2005; revised June 24, 2005. This work was supported by CERG, Research Grant Council, Hong Kong SAR, under Grant HKUST6049/00E. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Timothy J. Hazen.
PY - 2006/9
Y1 - 2006/9
N2 - Polynomial segment models (PSMs), which are generalization of the hidden Markov models (HMMs), have opened an alternative research direction for speech recognition. However, they have been limited by their computational complexity. Traditionally, any change in PSM segment boundary requires likelihood recomputation of all the frames within the segment. This makes the PSM's segment likelihood evaluation an order of magnitude more expensive than the HMM's. Furthermore, because recognition using segment models needs to search over all possible segment boundaries, the PSM recognition is computationally unfeasible beyond N-best rescoring. By exploiting the properties of the time normalization in PSM, and by decomposing the PSM segment likelihood into a simple function of "sufficient statistics," in this paper, we show that segment likelihood can be evaluated efficiently in an order of computational complexity similar to HMM. In addition, by reformulating the PSM recognition as a search for the optimal path through a graph, this paper introduces a fast PSM search algorithm that intelligently prunes the number of hypothesized segment boundaries, such that PSM recognition can be performed in an order of complexity similar to HMM. We demonstrate the effectiveness of the proposed algorithms with experiments using a PSM-based recognition system on two different recognition tasks: TIDIGIT digit recognition and the Wall Street Journal dictation task. In both tasks, PSM recognition is feasible and out-performed traditional HMM by more than 14%.
AB - Polynomial segment models (PSMs), which are generalization of the hidden Markov models (HMMs), have opened an alternative research direction for speech recognition. However, they have been limited by their computational complexity. Traditionally, any change in PSM segment boundary requires likelihood recomputation of all the frames within the segment. This makes the PSM's segment likelihood evaluation an order of magnitude more expensive than the HMM's. Furthermore, because recognition using segment models needs to search over all possible segment boundaries, the PSM recognition is computationally unfeasible beyond N-best rescoring. By exploiting the properties of the time normalization in PSM, and by decomposing the PSM segment likelihood into a simple function of "sufficient statistics," in this paper, we show that segment likelihood can be evaluated efficiently in an order of computational complexity similar to HMM. In addition, by reformulating the PSM recognition as a search for the optimal path through a graph, this paper introduces a fast PSM search algorithm that intelligently prunes the number of hypothesized segment boundaries, such that PSM recognition can be performed in an order of complexity similar to HMM. We demonstrate the effectiveness of the proposed algorithms with experiments using a PSM-based recognition system on two different recognition tasks: TIDIGIT digit recognition and the Wall Street Journal dictation task. In both tasks, PSM recognition is feasible and out-performed traditional HMM by more than 14%.
KW - Fast algorithm
KW - Polynomial segment model
KW - Search
KW - Speech recognition
UR - http://www.scopus.com/inward/record.url?scp=34047261800&partnerID=8YFLogxK
U2 - 10.1109/TSA.2005.858553
DO - 10.1109/TSA.2005.858553
M3 - Article
AN - SCOPUS:34047261800
SN - 1558-7916
VL - 14
SP - 1704
EP - 1718
JO - IEEE Transactions on Audio, Speech and Language Processing
JF - IEEE Transactions on Audio, Speech and Language Processing
IS - 5
ER -