Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

105

CIKM
2006
Springer

favoriteEmaildiscussreport

132views Information Technology» more CIKM 2006»

Text classification improved through multigram models

15 years 3 months ago

Text classification improved through multigram models

Download research.microsoft.com

Classification algorithms and document representation approaches are two key elements for a successful document classification system. In the past, much work has been conducted to find better ways to represent documents. However, most of the attempts rely on certain extra resources such as WordNet, or they face the problem of extremely high dimension. In this paper, we propose a new document representation approach based on n-multigram language models. This approach can automatically discover the hidden semantic sequences in the documents under each category. Based on n-multigram language models and n-gram language models, we put forward two text classification algorithms. The experiments on RCV1 show that our proposed algorithm based on n-multigram models alone can achieve the similar or even better classification performance compared with the classifier based on n-gram models but the model size of our algorithm is much smaller than that of the latter. Another proposed algorithm base...

Dou Shen, Jian-Tao Sun, Qiang Yang, Zheng Chen

Real-time Traffic

CIKM 2006 | Information Management | Language Models | N-gram Models | N-multigram Language Models |

claim paper

Related Content

» Multilayer model for Arabic text compression

» A Characterization of Wordnet Features in Boolean Models For Text Classification

» Topicbridged PLSA for crossdomain text classification

» Ontology Evaluation through Text Classification

» Knowledge discovery through directed probabilistic topic models a survey

» Improving Knowledge Discovery in Document Collections through Combining Text Retrieval and...

» Statement map reducing web information credibility noise through opinion classification

» Boosting for Text Classification with Semantic Features

» Improving the classification of newsgroup messages through social network analysis

Post Info
More Details (n/a)

Added	20 Aug 2010
Updated	20 Aug 2010
Type	Conference
Year	2006
Where	CIKM
Authors	Dou Shen, Jian-Tao Sun, Qiang Yang, Zheng Chen

Comments (0)