The construction of low-dimensional models explaining highdimensional signal observations provides concise and efficient data representations. In this paper, we focus on pattern ...
We propose a new system to mine visual knowledge on the Web. There are huge image data as well as text data on the Web. However, mining image data from the Web is paid less attent...
Transferring a structure from the visual modality to the aural one presents a difficult challenge. In this work we are experimenting with prosody modeling for the synthesized speec...
Producing caption for the deaf and hearing impaired is a labor intensive task. We implemented a software tool, named SmartCaption, for assisting the caption production process usin...
We introduce the first visual dataset of fast foods with a total of 4,545 still images, 606 stereo pairs, 303 3600 videos for structure from motion, and 27 privacy-preserving vide...