Sciweavers

298 search results - page 47 / 60
» An information-theoretic measure for document similarity
Sort
View
WWW
2006
ACM
16 years 13 days ago
Discovering event evolution graphs from newswires
In this paper, we propose an approach to automatically mine event evolution graphs from newswires on the Web. Event evolution graph is a directed graph in which the vertices and e...
Christopher C. Yang, Xiaodong Shi
KDD
2004
ACM
195views Data Mining» more  KDD 2004»
16 years 5 days ago
Improved robustness of signature-based near-replica detection via lexicon randomization
Detection of near duplicate documents is an important problem in many data mining and information filtering applications. When faced with massive quantities of data, traditional d...
Aleksander Kolcz, Abdur Chowdhury, Joshua Alspecto...
90
Voted
ICDAR
2009
IEEE
15 years 6 months ago
Scaling Up Whole-Book Recognition
We describe the results of large-scale experiments with algorithms for unsupervised improvement of recognition of book-images using fully automatic mutual-entropy-based model adap...
Pingping Xiu, Henry S. Baird
ICDAR
1999
IEEE
15 years 4 months ago
MergeLayouts: Overcoming Faulty Segmentations by a Comprehensive Voting of Commercial OCR Devices
In this paper, we will present a comprehensive voting approach, taking entire layouts obtained from commercial OCR devices as input. Such a layout comprises segments of three kind...
Stefan Klink, Thorsten Jäger
ERCIMDL
2008
Springer
107views Education» more  ERCIMDL 2008»
15 years 1 months ago
Revisiting Lexical Signatures to (Re-)Discover Web Pages
A lexical signature (LS) is a small set of terms derived from a document that capture the "aboutness" of that document. A LS generated from a web page can be used to disc...
Martin Klein, Michael L. Nelson