TY - GEN
T1 - A hybrid Gaussian-HMM-deep-learning approach for automatic chord estimation with very large vocabulary
AU - Deng, Junqi
AU - Kwok, Yu Kwong
N1 - Publisher Copyright:
© Junqi Deng and Yu-Kwong Kwok.
PY - 2016
Y1 - 2016
N2 - We propose a hybrid Gaussian-HMM-Deep-Learning approach for automatic chord estimation with very large chord vocabulary. The Gaussian-HMM part is similar to Chordino, which is used as a segmentation engine to divide input audio into note spectrogram segments. Two types of deep learning models are proposed to classify these segments into chord labels, which are then connected as chord sequences. Two sets of evaluations are conducted with two large chord vocabularies. The first evaluation is conducted in a recent MIREX standard way. Results show that our approach has obvious advantage over the state-of-the-art large-vocabulary-with-inversions supportable ACE system in terms of large vocabularies, although is outperformed by in small vocabularies. Through analyzing and deducing system behaviors behind the results, we see interesting chord confusion patterns made by different systems, which conceivably point to a demand of more balanced and consistent annotated datasets for training and testing. The second evaluation preliminarily demonstrates our approach’s superiority on a jazz chord vocabulary with 36 chord types, compared with a Chordino-like Gaussian-HMM baseline system with augmented vocabulary capacity.
AB - We propose a hybrid Gaussian-HMM-Deep-Learning approach for automatic chord estimation with very large chord vocabulary. The Gaussian-HMM part is similar to Chordino, which is used as a segmentation engine to divide input audio into note spectrogram segments. Two types of deep learning models are proposed to classify these segments into chord labels, which are then connected as chord sequences. Two sets of evaluations are conducted with two large chord vocabularies. The first evaluation is conducted in a recent MIREX standard way. Results show that our approach has obvious advantage over the state-of-the-art large-vocabulary-with-inversions supportable ACE system in terms of large vocabularies, although is outperformed by in small vocabularies. Through analyzing and deducing system behaviors behind the results, we see interesting chord confusion patterns made by different systems, which conceivably point to a demand of more balanced and consistent annotated datasets for training and testing. The second evaluation preliminarily demonstrates our approach’s superiority on a jazz chord vocabulary with 36 chord types, compared with a Chordino-like Gaussian-HMM baseline system with augmented vocabulary capacity.
UR - http://www.scopus.com/inward/record.url?scp=85033574575&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85033574575
T3 - Proceedings of the 17th International Society for Music Information Retrieval Conference, ISMIR 2016
SP - 812
EP - 818
BT - Proceedings of the 17th International Society for Music Information Retrieval Conference, ISMIR 2016
A2 - Mandel, Michael I.
A2 - Devaney, Johanna
A2 - Turnbull, Douglas
A2 - Tzanetakis, George
T2 - 17th International Society for Music Information Retrieval Conference, ISMIR 2016
Y2 - 7 August 2016 through 11 August 2016
ER -