In this paper, we propose a prism-based system for capturing multispectral videos. The system consists of a triangular prism, a monochrome camera, and an occlusion mask. Incoming ...
Annotating the regions, text lines and characters of document images is an important, but tedious and expensive task. A ground-truthing tool may largely alleviate the human burden...
Music consists of both local and long-term temporal information. However, for a genre classification task, most of the text categorization based approaches only capture local temp...
Large vocabulary automatic speech recognition (ASR) technologies perform well in known and controlled contexts. In less controlled conditions, however, human review is often neces...
Ideal binary masks are binary patterns that encode the masking characteristics of speech in noise. Recent evidence in speech perception suggests that such binary patterns provide ...