Evaluation of IR systems has always been difficult because of the need for manually assessed relevance judgments. The advent of large editor-driven taxonomies on the web opens the...
Steven M. Beitzel, Eric C. Jensen, Abdur Chowdhury...
A website can regulate search engine crawler access to its content using the robots exclusion protocol, specified in its robots.txt file. The rules in the protocol enable the site...
Abstract. In this paper we present static and dynamic studies of duplicate and near-duplicate documents in the Web. The static and dynamic studies involve the analysis of similar c...
Description logics (DLs) are widely employed in recent semantic web application systems. However, classical description logics are limited when dealing with imprecise concepts and ...
The vast majority of the features used in today’s commercially deployed image search systems employ techniques that are largely indistinguishable from text-document search – t...