This paper proposes a method of collecting a dozen terms that are closely related to a given seed term. The proposed method consists of three steps. The first step, compiling cor...
— We propose a hierarchical approach to document categorization that requires no pre-configuration and maps the semantic document space to a predefined taxonomy. The utilizatio...
Robert Wetzker, Tansu Alpcan, Christian Bauckhage,...
Summarization of text documents is increasingly important with the amount of data available on the Internet. The large majority of current approaches view documents as linear sequ...
Search engines are powerful tools to find information on the Web. However, they commonly return a lot of irrelevant documents when the users’ queries are not specific enough. To...
We consider the problem of sampling URLs uniformly at random from the Web. A tool for sampling URLs uniformly can be used to estimate various properties of Web pages, such as the ...
Monika Rauch Henzinger, Allan Heydon, Michael Mitz...