Automatic Phoneme Segmentation with Relaxed Textual Constraints

15 years 9 months ago

Download www.lrec-conf.org

Speech synthesis by unit selection requires the segmentation of a large single speaker high quality recording. Automatic speech recognition techniques, e.g. Hidden Markov Models (HMM), can be optimised for maximum segmentation accuracy. This paper presents the results of tuning such a phoneme segmentation system. Firstly, using no text transcription, the design of an HMM phoneme recogniser is optimised subject to a phoneme bigram language model. Optimal performance is obtained with triphone models, 7 states per phoneme and 5 Gaussians per state, reaching 94.4% phoneme recognition accuracy with 95.2% of phoneme boundaries within 70 ms of hand labelled boundaries. Secondly, using the textual information modeled by a multi-pronunciation phonetic graph built according to errors found in the first step, the reported phoneme recognition accuracy increases to 96.8% with 96.1% of phoneme boundaries within 70 ms of hand labelled boundaries. Finally, the results from these two segmentation meth...

Pierre Lanchantin, Andrew C. Morris, Xavier Rodet,

Real-time Traffic

Education | LREC 2008 | Phoneme Boundaries | Phoneme Recognition Accuracy | Phoneme Segmentation System |

claim paper

» Coclustering of Image Segments Using Convex Optimization Applied to EM Neuronal Reconstruc...

» Structurebased color learning on a mobile robot under changing illumination

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	LREC
Authors	Pierre Lanchantin, Andrew C. Morris, Xavier Rodet, Christophe Veaux

Comments (0)

Sciweavers

Automatic Phoneme Segmentation with Relaxed Textual Constraints

Education | LREC 2008 | Phoneme Boundaries | Phoneme Recognition Accuracy | Phoneme Segmentation System |

Explore & Download

Productivity Tools

Sciweavers