Voice conversion can be reduced to a problem to find a transformation function between the corresponding speech sequences of two speakers. Perhaps the most voice conversions meth...
This paper presents an efficient algorithm for gesture detection in lecture videos by combining visual, speech and electronic slides. Besides accuracy, response time is also cons...
This paper presents a system which automatically generates shallow semantic frame structures for conversational speech in unrestricted domains. We argue that such shallow semantic...
Current state-of-the-art speech recognition systems work quite well in controlled environments but their performance degrades severely in realistic acoustical conditions in reverb...
Large vocabulary automatic speech recognition (ASR) technologies perform well in known and controlled contexts. In less controlled conditions, however, human review is often neces...