Evaluation of IR systems has always been difficult because of the need for manually assessed relevance judgments. The advent of large editor-driven taxonomies on the web opens the...
Steven M. Beitzel, Eric C. Jensen, Abdur Chowdhury...
A website can regulate search engine crawler access to its content using the robots exclusion protocol, specified in its robots.txt file. The rules in the protocol enable the site...
Abstract. In this paper we present static and dynamic studies of duplicate and near-duplicate documents in the Web. The static and dynamic studies involve the analysis of similar c...
The vast majority of the features used in today’s commercially deployed image search systems employ techniques that are largely indistinguishable from text-document search – t...
We propose the concept of research trails to help web users create and reestablish context across fragmented research processes without requiring them to explicitly structure and ...