In this paper, we propose an approach to automatically mine event evolution graphs from newswires on the Web. Event evolution graph is a directed graph in which the vertices and e...
Detection of near duplicate documents is an important problem in many data mining and information filtering applications. When faced with massive quantities of data, traditional d...
Aleksander Kolcz, Abdur Chowdhury, Joshua Alspecto...
We describe the results of large-scale experiments with algorithms for unsupervised improvement of recognition of book-images using fully automatic mutual-entropy-based model adap...
In this paper, we will present a comprehensive voting approach, taking entire layouts obtained from commercial OCR devices as input. Such a layout comprises segments of three kind...
A lexical signature (LS) is a small set of terms derived from a document that capture the "aboutness" of that document. A LS generated from a web page can be used to disc...