Sciweavers

367 search results - page 31 / 74
» Indexing Text Documents Based on Topic Identification
Sort
View
ICDE
2007
IEEE
211views Database» more  ICDE 2007»
15 years 6 months ago
Document Representation and Dimension Reduction for Text Clustering
Increasingly large text datasets and the high dimensionality associated with natural language create a great challenge in text mining. In this research, a systematic study is cond...
M. Mahdi Shafiei, Singer Wang, Roger Zhang, Evange...
CIKM
2005
Springer
15 years 1 months ago
Fast on-line index construction by geometric partitioning
Inverted index structures are the mainstay of modern text retrieval systems. They can be constructed quickly using off-line mergebased methods, and provide efficient support for ...
Nicholas Lester, Alistair Moffat, Justin Zobel
SIGIR
1999
ACM
15 years 4 months ago
Probabilistic Latent Semantic Indexing
Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data. Fit...
Thomas Hofmann
DAS
2006
Springer
15 years 3 months ago
Writer Identification for Smart Meeting Room Systems
Abstract. In this paper we present a text independent on-line writer identification system based on Gaussian Mixture Models (GMMs). This system has been developed in the context of...
Marcus Liwicki, Andreas Schlapbach, Horst Bunke, S...
ICDAR
1997
IEEE
15 years 4 months ago
Representing OCRed documents in HTML
ABSTRACT: OCR is an error-prone process. It is time-consuming and expensive to manually proofread OCR results. The errors remaining in OCRed texts can cause serious problems in rea...
Tao Hong, Sargur N. Srihari