Document clustering is a very hard task in Automatic Text Processing since it requires to extract regular patterns from a document collection without a priori knowledge on the cat...
Abstract. We present an approach to increasing the effectiveness of rankedoutput retrieval systems that relies on graphical display and user manipulation of “views” of retrieva...
Maintaining and extending large thesauri is an important challenge facing digital libraries and IT businesses alike. In this paper we describe a method building on and extending ex...
Robert Meusel, Mathias Niepert, Kai Eckert, Heiner...
Search queries applied to extract relevant information from the World Wide Web over a period of time may be denoted as continuous search queries. The improvement of continuous sea...
This paper expands on a 1997 study of the amount and distribution of near-duplicate pages on the World Wide Web. We downloaded a set of 150 million web pages on a weekly basis ove...