Revisiting graphemes with increasing amounts of data

13 years 11 months ago

Download www-nlp.stanford.edu

Letter units, or graphemes, have been reported in the literature as a surprisingly effective substitute to the more traditional phoneme units, at least in languages that enjoy a strong correspondence between pronunciation and orthography. For English however, where letter symbols have less acoustic consistency, previously reported results fell short of systems using highly-tuned pronunciation lexicons. Grapheme units simplify system design, but since graphemes map to a wider set of acoustic realizations than phonemes, we should expect grapheme-based acoustic models to require more training data to capture these variations. In this paper, we compare the rate of improvement of grapheme and phoneme systems trained with datasets ranging from 450 to 1200 hours of speech. We consider various grapheme unit conﬁgurations, including using letter-speciﬁc, onset, and coda units. We show that the grapheme systems improve faster and, depending on the lexicon, reach or surpass the phoneme basel...

Yun-Hsuan Sung, Thad Hughes, Françoise Beau

Real-time Traffic

Grapheme | Grapheme Unit | ICASSP 2009 | Signal Processing | Traditional Phoneme Units |

claim paper

Added	21 May 2010
Updated	21 May 2010
Type	Conference
Year	2009
Where	ICASSP
Authors	Yun-Hsuan Sung, Thad Hughes, Françoise Beaufays, Brian Strope

Sciweavers

Revisiting graphemes with increasing amounts of data

Grapheme | Grapheme Unit | ICASSP 2009 | Signal Processing | Traditional Phoneme Units |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers