Natural sounds are structured on many time-scales. A typical segment of speech, for example, contains features that span four orders of magnitude: Sentences (∼1 s); phonemes (...
This paper presents a decentralized approach to multiple target tracking. The novelty of this approach lies in the use of a set of autonomous while collaborative trackers to overc...
So far extending light field rendering to dynamic scenes has been trivially treated as the rendering of static light fields stacked in time. This type of approaches requires inp...
Abstract. We address the problem of learning good features for understanding video data. We introduce a model that learns latent representations of image sequences from pairs of su...
Great effort has been made to improve video concept detection and continuous progress has been reported. With the current evaluation method being confined to carefully annotated d...