Sciweavers

93 search results - page 16 / 19
» Enhanced word clustering for hierarchical text classificatio...
Sort
View
DAS
2008
Springer
14 years 11 months ago
A Complete Optical Character Recognition Methodology for Historical Documents
In this paper a complete OCR methodology for recognizing historical documents, either printed or handwritten without any knowledge of the font, is presented. This methodology cons...
Georgios Vamvakas, Basilios Gatos, Nikolaos Stamat...
82
Voted
EMNLP
2010
14 years 7 months ago
Evaluating Models of Latent Document Semantics in the Presence of OCR Errors
Models of latent document semantics such as the mixture of multinomials model and Latent Dirichlet Allocation have received substantial attention for their ability to discover top...
Daniel David Walker, William B. Lund, Eric K. Ring...
SIGIR
2008
ACM
14 years 9 months ago
Optical character recognition errors and their effects on natural language processing
Errors are unavoidable in advanced computer vision applications such as optical character recognition, and the noise induced by these errors presents a serious challenge to downstr...
Daniel P. Lopresti
ACL
1994
14 years 10 months ago
A Corpus-Based Approach to Automatic Compound Extraction
An automatic compound retrieval method is proposed to extract compounds within a text message. It uses n-gram mutual information, relative frequency count and parts of speech as t...
Keh-Yih Su, Ming-Wen Wu, Jing-Shin Chang
TREC
2007
14 years 10 months ago
WIM at TREC 2007
This paper introduced the four tracks that WIM-Lab Fudan University had taken part in at TREC 2007. For spam track, a multi-centre model was proposed considering the characteristi...
Jun Xu, Jing Yao, Jiaqian Zheng, Qi Sun, Junyu Niu