Sciweavers

ICASSP
2011
IEEE

Improved F0 modeling and generation in voice conversion

12 years 8 months ago
Improved F0 modeling and generation in voice conversion
F0 is an acoustic feature that varies largely from one speaker to another. F0 is characterized by a discontinuity in the transition between voiced and unvoiced sounds that presents an obstacle to GMM modeling for use in voice conversion. A Multi-Space Distribution (MSD) [5] can be used to model unvoiced and voiced F0 regions in a linearly weighted mixture. However, the use of two incompatible probabilistic spaces, for example a continuous probability density for voiced observations, and a discrete probability for unvoiced observations, may result in an imprecise voiced/unvoiced (v/u) conversion in a maximum likelihood (ML) sense. In this paper we propose to use voicing strength, characterized by the normalized correlation coefficient magnitude, as calculated from F0 feature extraction, as an additional feature for improving F0 modeling and the v/u decision in the context of voice conversion. The proposed method was evaluated on male-to-female voice conversion tasks in both Mandarin a...
Aki Kunikoshi, Yao Qian, Frank K. Soong, Nobuaki M
Added 21 Aug 2011
Updated 21 Aug 2011
Type Journal
Year 2011
Where ICASSP
Authors Aki Kunikoshi, Yao Qian, Frank K. Soong, Nobuaki Minematsu
Comments (0)