This paper presents an original computational approach to extraction of movie tempo for deriving story sections and events that convey high level semantics of stories portrayed in...
Recent content-based video retrieval systems combine output of concept detectors (also known as high-level features) with text obtained through automatic speech recognition. This ...
Robin Aly, Djoerd Hiemstra, Arjen P. de Vries, Fra...
Speech recognition has become common in many application domains, from dictation systems for professional practices to vocal user interfaces for people with disabilities or hands-...
Sabato Marco Siniscalchi, Fulvio Gennaro, Salvator...
In this paper, we present an event parsing algorithm based on Stochastic Context Sensitive Grammar (SCSG) for understanding events, inferring the goal of agents, and predicting th...
Mingtao Pei, School of Computer Science, Yunde Jia...
We introduce a direct model for speech recognition that assumes an unstructured, i.e., flat text output. The flat model allows us to model arbitrary attributes and dependences o...
Georg Heigold, Geoffrey Zweig, Xiao Li, Patrick Ng...