With the exponential growth of the available information on the World Wide Web, a traditional search engine, even if based on sophisticated document indexing algorithms, has diffi...
In this paper, we present a novel near-duplicate document detection method that can easily be tuned for a particular domain. Our method represents each document as a real-valued s...
Hannaneh Hajishirzi, Wen-tau Yih, Aleksander Kolcz
Category ranking provides a way to classify plain text documents into a pre-determined set of categories. This work proposes to have a look at typical document collections and ana...
In this paper, we propose an alternative method for accessing the content of Greek historical documents printed during the 17th and 18th centuries by searching words directly in d...
Anastasios L. Kesidis, Eleni Galiotou, Basilios Ga...
During the last decade national archives, libraries, museums and companies started to make their records, books and files electronically available. In order to allow efficient ac...
Andreas Stoffel, David Spretke, Henrik Kinnemann, ...