Click data captures many users’ document preferences for a query and has been shown to help significantly improve search engine ranking. However, most click data is noisy and of...
Certain forms of mathematical expression are used more often than others in practice. A quantitative understanding of actual usage can provide additional information to improve th...
The self-organizing desk is a system that enhances a physical desk-top with electronic information. It can remember, organize, update, and manipulate the information contained in ...
Abstract. Most common feature selection techniques for document categorization are supervised and require lots of training data in order to accurately capture the descriptive and d...
Suppose there is a large corpus of XML documents, each of which describes a movie released in the last 30 years (for example, extracted from IMDB). A movie enthusiast wants to mak...