This paper presents results in automated genre classification of digital documents in PDF format. It describes genre classification as an important ingredient in contextualising s...
We investigate three issues in distributed information retrieval, considering both TREC data and U.S. Patents: (1) topical organization of large text collections, (2) collection r...
Leah S. Larkey, Margaret E. Connell, James P. Call...
With the rise of photo-sharing websites such as Facebook and Flickr has come dramatic growth in the number of photographs online. Recent research in object recognition has used su...
Yunpeng Li, David J. Crandall, Daniel P. Huttenloc...
Identifying a subset of features that preserves classification accuracy is a problem of growing importance, because of the increasing size and dimensionality of real-world data se...
In this paper, we examine the performance of the two policies for keyword selection over standard document corpora of varying properties. While in corpus-based policy a single set ...