People are seldom aware that their search queries frequently mismatch a majority of the relevant documents. This may not be a big problem for topics with a large and diverse set o...
In this paper, we introduce an information theoretic method for estimating the usefulness of the hyperlink structure induced from the set of retrieved documents. We evaluate the e...
Recently, stability-based techniques have emerged as a very promising solution to the problem of cluster validation. An inherent drawback of these approaches is the computational c...
Web crawlers are increasingly used for focused tasks such as the extraction of data from Wikipedia or the analysis of social networks like last.fm. In these cases, pages are far m...
Franziska von dem Bussche, Klara A. Weiand, Benedi...
Increasingly, companies recognize that most of their important information does not exist in relational stores but in documents. For a long time, textual information has been rela...