We describe a system for separating multiple sources from a two-channel recording based on interaural cues and prior knowledge of the statistics of the underlying source signals. ...
Ron J. Weiss, Michael I. Mandel, Daniel P. W. Elli...
The taking of turns to speak is an intrinsic property of conversation. It is expected that models of taking turns, providing a prior distribution over conversational form, can red...
This paper describes a simple method for significantly improving Tandem features used to train acoustic models for large-vocabulary speech recognition. The linear activations at ...
Practical supervised learning scenarios involving subjectively evaluated data have multiple evaluators, each giving their noisy version of the hidden ground truth. Majority logic ...
In this paper, we propose a multimodal system for detecting human activity and interaction patterns in a nursing home. Activities of groups of people are firstly treated as intera...