Near-duplicate detection is not only an important pre and post processing task in Information Retrieval but also an effective spam-detection technique. Among different approache...
This paper focuses on decentralized personalized search engines. It is composed of three parts. Firstly, we formulate the problem and we propose a graph-based measure of quality o...
In this paper, we present a novel near-duplicate document detection method that can easily be tuned for a particular domain. Our method represents each document as a real-valued s...
Hannaneh Hajishirzi, Wen-tau Yih, Aleksander Kolcz
In this paper, we present our online summarization system of web topics. The user defines the topic by a set of keywords. Then the system searches the Web for the relevant documen...
Abstract. In semantic web applications where query initiators and information providers do not necessarily share the same ontology, semantic interoperability generally relies on on...
Anthony Ventresque, Sylvie Cazalens, Philippe Lama...