Today, search engine is the most commonly used tool for Web information retrieval, however, its current status is still far from satisfaction. This paper focuses on clustering Web...
User queries to search engines are observed to predominantly contain inflected content words but lack stopwords and capitalization. Thus, they often resemble natural language que...
Topical noise in blogs arises when bloggers digress from the central topical thrust of their blogs. We introduce a method to explicitly incorporate a model of topical noise into a...
We present similarity-based methods to cluster digital photos by time and image content. The approach is general, unsupervised, and makes minimal assumptions regarding the structu...
Matthew L. Cooper, Jonathan Foote, Andreas Girgens...
Virtually all proposals for querying XML include a class of query we term "containment queries". It is also clear that in the foreseeable future, a substantial amount of...
Chun Zhang, Jeffrey F. Naughton, David J. DeWitt, ...