: Conventional discussion environments provide the technical platform for distributed discussion and collaboration, but apart from some statistical data collected, rarely provide i...
This paper describes the general structure of a full automated document analysis system for printed documents. The system is based on a character preclassification stage which red...
This paper proposes a novel feature-based invariant descriptor termed Radon composite features (RCFs) for planar shapes. Instead of analyzing shapes directly in the spatial domain,...
A crucial step in processing speech audio data for information extraction, topic detection, or browsing/playback is to segment the input into sentence and topic units. Speech segm...
Elizabeth Shriberg, Andreas Stolcke, Dilek Z. Hakk...
Latent Dirichlet allocation is a fully generative statistical language model that has been proven to be successful in capturing both the content and the topics of a corpus of docum...