Sciweavers

CIKM
2006
Springer

Text classification improved through multigram models

13 years 7 months ago
Text classification improved through multigram models
Classification algorithms and document representation approaches are two key elements for a successful document classification system. In the past, much work has been conducted to find better ways to represent documents. However, most of the attempts rely on certain extra resources such as WordNet, or they face the problem of extremely high dimension. In this paper, we propose a new document representation approach based on n-multigram language models. This approach can automatically discover the hidden semantic sequences in the documents under each category. Based on n-multigram language models and n-gram language models, we put forward two text classification algorithms. The experiments on RCV1 show that our proposed algorithm based on n-multigram models alone can achieve the similar or even better classification performance compared with the classifier based on n-gram models but the model size of our algorithm is much smaller than that of the latter. Another proposed algorithm base...
Dou Shen, Jian-Tao Sun, Qiang Yang, Zheng Chen
Added 20 Aug 2010
Updated 20 Aug 2010
Type Conference
Year 2006
Where CIKM
Authors Dou Shen, Jian-Tao Sun, Qiang Yang, Zheng Chen
Comments (0)