We consider the problem of dust: Different URLs with Similar Text. Such duplicate URLs are prevalent in web sites, as web server software often uses aliases and redirections, and...
There are several pieces of information that can be utilized in order to improve the efficiency of similarity searches on high-dimensional data. The most commonly used information...
Abstract. In this paper an effective context-based approach for interactive similarity queries is presented. By exploiting the notion of image “context”, it is possible to asso...
The definition of similarity measures—one core component of every CBR application—leads to a serious knowledge acquisition problem if domain and application specific requirem...
In many text retrieval tasks, it is highly desirable to obtain a "similarity profile" of the document collection for a given query. We propose sampling-based techniques ...