The difference in risk minimization leads to the better generalization performance of SVM compared with HMM. In contrast, SVM uses structural risk minimization as its induction principle. HMM uses empirical risk minimization, which is the simplest induction principle.
The main difference between HMM and SVM is in the principle of risk minimization. The support vector machine (SVM) method can achieve a high recognition rate, but encounters challenges in the presence of cross-aliasing that cannot be accurately judged. The recognition rate of the HMM method is typically relatively high, but it only considers the role of positive training samples without addressing the impact of negative training samples, thereby greatly limiting its discriminative ability. The HMM method is a statistical model and its parameter estimation requires substantial training data. However, its drawbacks include the problem of creating a model of templates of chroma vectors and the selection of a distance measure. The template-based method has some advantages, including the fact that it does not require annotated data and has a low computational time. Modeling techniques typically use template-fitting methods, the hidden Markov model (HMM), and dynamic Bayesian networks for this recognition process.
The results of our method are comparable to the state-of-the-art methods that entered the MIREX in 20 for the MIREX’09 Audio Chord Estimation task dataset.Ĭhord classification is computed once the feature has been extracted. Furthermore, we conduct comprehensive experiments using different pitch class profile feature vectors to examine the performance of TCSVM. We perform this study using the MIREX’09 (Music Information Retrieval Evaluation eXchange) Audio Chord Estimation dataset. To exploit the temporal correlation among the LPCP features of chords, we propose an improved support vector machine algorithm called TCSVM. Then, we extract a new logarithmic pitch class profile (LPCP) feature called enhanced LPCP from the low-rank part. Using robust principal component analysis, we expect the low-rank component of the spectrogram matrix to contain the musical accompaniment and the sparse component to contain the vocal signals. We first use robust principal component analysis to separate the singing voice from the music to reduce the influence of the singing voice and consider the temporal correlations of the chord features. In this paper, we propose a method called temporal correlation support vector machine (TCSVM) for automatic major-minor chord recognition in audio music.