Web search engines are facing formidable performance challenges as they need to process thousands of queries per second over billions of documents. To deal with this heavy workloa...
We study the problem of creating highly compressed fulltext index structures for versioned document collections, that is, collections that contain multiple versions of each docume...
The Semantic Web, which represents a web of knowledge, offers new opportunities to search for knowledge and information. To harvest such search power requires robust and scalable ...
We present a probabilistic model for a document corpus that combines many of the desirable features of previous models. The model is called “GaP” for Gamma-Poisson, the distri...
This paper aims to quantify two common assumptions about social tagging: (1) that tags are “meaningful” and (2) that the tagging process is influenced by tag suggestions. For...
Fabian M. Suchanek, Milan Vojnovic, Dinan Gunaward...