Both document clustering and word clustering are well studied problems. Most existing algorithms cluster documents and words separately but not simultaneously. In this paper we pr...
This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. ...
Kamal Nigam, Andrew McCallum, Sebastian Thrun, Tom...
We propose a new Web search system that helps users clarify their information needs through interaction. The system represents the user's information needs using a query grap...
For people who use text-based web browsers, graphs, diagrams, and pictures are inaccessible. Yet, such diagrams are quite prominent in documents commonly found on the web. In this...
Kathleen F. McCoy, Sandra Carberry, Tom Roper, Nan...
In this paper, we present the AutoCat system for product classification. AutoCat uses a vector space model, modified to consider product attributes unavailable in traditional docu...