Web pages are more than text and they contain much contextual and structural information, e.g., the title, the meta data, the anchor text, etc., each of which can be seen as a dat...
This paper explores the use of hierarchical structure for classifying a large, heterogeneous collection of web content. The hierarchical structure is initially used to train diffe...
We describe and evaluate a new method of automatic seed word selection for unsupervised sentiment classification of product reviews in Chinese. The whole method is unsupervised an...
In many important text classification problems, acquiring class labels for training documents is costly, while gathering large quantities of unlabeled data is cheap. This paper sh...
Kamal Nigam, Andrew McCallum, Sebastian Thrun, Tom...
In many applications of data mining we know beforehand that the response variable should be increasing (or decreasing) in the attributes. Such relations between response and attrib...