Learning words from sights and sounds: a computational model

10 years 6 months ago
Learning words from sights and sounds: a computational model
This paper presents an implemented computational model of word acquisition which learns directly from raw multimodal sensory input. Set in an information theoretic framework, the model acquires a lexicon by finding and statistically modeling consistent cross-modal structure. The model has been implemented in a system using novel speech processing, computer vision, and machine learning algorithms. In evaluations the model successfully performed speech segmentation, word discovery and visual categorization from spontaneous infant-directed speech paired with video images of single objects. These results demonstrate the possibility of using state-of-the-art techniques from sensory pattern recognition and machine learning to implement cognitive models which can process raw sensor data without the need for human transcription or labeling.
Deb Roy, Alex Pentland
Added 17 Dec 2010
Updated 17 Dec 2010
Type Journal
Year 2002
Authors Deb Roy, Alex Pentland
Comments (0)