Soft frame margin estimation of Gaussian Mixture Models for speaker recognition with sparse training data

12 years 8 months ago

Download mirlab.org

—Discriminative Training (DT) methods for acoustic modeling, such as MMI, MCE, and SVM, have been proved effective in speaker recognition. In this paper we propose a DT method for GMM using soft frame margin estimation. Unlike other DT methods such as MMI or MCE, the soft frame margin estimation attempts to enhance the generalization capability of GMM to unseen data in case the mismatch exists between training data and unseen data. We deﬁne an objective function which integrates multi-class separation frame margin and loss function, both as functions of GMM likelihoods. We propose to optimize the objective function based on a convex optimization technique, semideﬁnite programming. As shown in our experimental results, the proposed soft frame margin discriminative training with semideﬁnite programming optimization (SFMESDP) is very effective for robust speaker model training when only limited amounts of training data are available.

Yan Yin, Qi Li

Real-time Traffic