This paper presents an extensive study about the evolution of textual content on the Web, which shows how some new pages are created from scratch while others are created using al...
The retrieval of similar documents in the Web from a given document is different in many aspects from information retrieval based on queries generated by regular search engine use...
Felipe Bravo-Marquez, Gaston L'Huillier, Sebasti&a...
Marking up search queries with linguistic annotations such as part-of-speech tags, capitalization, and segmentation, is an important part of query processing and understanding in ...
While query expansion techniques have been shown to improve retrieval performance in a centralized setting, they have not been well studied in a federated setting. In this paper, ...
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...