Trainable Videorealistic Speech Animation

11 years 9 months ago
Trainable Videorealistic Speech Animation
We describe how to create with machine learning techniques a generative, videorealistic, speech animation module. A human subject is first recorded using a videocamera as he/she utters a predetermined speech corpus. After processing the corpus automatically, a visual speech module is learned from the data that is capable of synthesizing the human subject's mouth uttering entirely novel utterances that were not recorded in the original video. The synthesized utterance is re-composited onto a background sequence which contains natural head and eye movement. The final output is videorealistic in the sense that it looks like a video camera recording of the subject. At run time, the input to the system can be either real audio sequences or synthetic audio produced by a text-to-speech system, as long as they have been phonetically aligned. The two key contributions of this paper are 1) a variant of the multidimensional morphable model (MMM) to synthesize new, previously unseen mouth co...
Tony Ezzat, Gadi Geiger, Tomaso Poggio
Added 20 Aug 2010
Updated 20 Aug 2010
Type Conference
Year 2004
Where FGR
Authors Tony Ezzat, Gadi Geiger, Tomaso Poggio
Comments (0)