Grounded language models represent the relationship between words and the non-linguistic context in which they are said. This paper describes how they are learned from large corpo...
We propose a method for human full-body pose tracking from measurements of wearable inertial sensors. Since the data provided by such sensors is sparse, noisy and often ambiguous, ...
In this paper we develop a system for human behaviour recognition in video sequences. Human behaviour is modelled as a stochastic sequence of actions. Actions are described by a f...
Complex human activities occurring in videos can be defined in terms of temporal configurations of primitive actions. Prior work typically hand-picks the primitives, their total...
We introduce an epitomic representation for modeling human activities in video sequences. A video sequence is divided into segments within which the dynamics of objects is assumed...