Traditional wisdom holds that once documents are turned into bag-of-words (unigram count) vectors, word orders are completely lost. We introduce an approach that, perhaps surprisi...
Xiaojin Zhu, Andrew B. Goldberg, Michael Rabbat, R...
This work is in the domain of Electronic Document Management (EDM) [1]. The documents can be an electronic writing, an image, a sound file, a network protocol message, a set of da...
— We propose a hierarchical approach to document categorization that requires no pre-configuration and maps the semantic document space to a predefined taxonomy. The utilizatio...
Robert Wetzker, Tansu Alpcan, Christian Bauckhage,...
This paper presents a model for describing the synchronization between several media delivered over a network in a Web-based environment. Synchronization concerns the download and...
This paper describes an algorithm for the determination of zone content type of a given zone within a document image. We take a statistical based approach and represent each zone ...