Semantic speech editing

13 years 5 days ago
Semantic speech editing
Editing speech data is currently time-consuming and errorprone. Speech editors rely on acoustic waveform representations, which force users to repeatedly sample the underlying speech to identify words and phrases to edit. Instead we developed a semantic editor that reduces the need for extensive sampling by providing access to meaning. The editor shows a time-aligned errorful transcript produced by applying automatic speech recognition (ASR) to the original speech. Users visually scan the words in the transcript to identify important phrases. They then edit the transcript directly using standard word processing `cut and paste' operations, which extract the corresponding time-aligned speech. ASR errors mean that users must supplement what they read in the transcript by accessing the original speech. Even when there are transcript errors, however, the semantic representation still provides users with enough information to target what they edit and play, reducing the need for extens...
Steve Whittaker, Brian Amento
Added 01 Dec 2009
Updated 01 Dec 2009
Type Conference
Year 2004
Where CHI
Authors Steve Whittaker, Brian Amento
Comments (0)