A frame mapping based HMM approach to cross-lingual voice transformation

12 years 8 months ago

Download mirlab.org

Cross-lingual voice transformation is challenging when source language (L1) and target language (L2) are very different in corresponding phonetics and prosodies. We propose a frame mapping based HMM approach to this problem. The source speaker’s speech data is first warped in frequency toward the target speaker by mapping corresponding formants of selected vowels. The parameter trajectories of the warped data are then “tiled” with the frames in target speaker’s L2 data. The tiled new trajectories then form a simulated training set of target speaker in L1 and it is used to train an HMM TTS. With a bilingual (Mandarin and English) source speaker and a monolingual (English) target speaker, the frame mapping-based approach is capable of generating highly intelligible, good quality speech data in L1 (Mandarin), which sounds rather close to the target speaker. The good performance of the cross-lingual voice transformation is confirmed with speaker similarity, naturalness and intelli...

Yao Qian, Ji Xu, Frank K. Soong

Real-time Traffic

Cross-lingual Voice Transformation | ICASSP 2011 | Signal Processing | Source Speaker | Target Speaker |

claim paper

Post Info
More Details (n/a)

Added	20 Aug 2011
Updated	20 Aug 2011
Type	Journal
Year	2011
Where	ICASSP
Authors	Yao Qian, Ji Xu, Frank K. Soong

Comments (0)

Sciweavers

A frame mapping based HMM approach to cross-lingual voice transformation

Cross-lingual Voice Transformation | ICASSP 2011 | Signal Processing | Source Speaker | Target Speaker |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers