Distributional Features for Text Categorization

10 years 3 months ago
Distributional Features for Text Categorization
Abstract-- Text categorization is the task of assigning predefined categories to natural language text. With the widely used `bag of words' representation, previous researches usually assign a word with values such that whether this word appears in the document concerned or how frequently this word appears. Although these values are useful for text categorization, they have not fully expressed the abundant information contained in the document. This paper explores the effect of other types of values, which express the distribution of a word in the document. These novel values assigned to a word are called distributional features, which include the compactness of the appearances of the word and the position of the first appearance of the word. The proposed distributional features are exploited by a tfidf style equation and different features are combined using ensemble learning techniques. Experiments show that the distributional features are useful for text categorization. In cont...
Xiao-Bing Xue, Zhi-Hua Zhou
Added 14 Oct 2010
Updated 14 Oct 2010
Type Conference
Year 2006
Where ECML
Authors Xiao-Bing Xue, Zhi-Hua Zhou
Comments (0)