A Boosting Algorithm for Classification of Semi-Structured Text

15 years 5 months ago

Download chasen.org

The focus of research in text classification has expanded from simple topic identification to more challenging tasks such as opinion/modality identification. Unfortunately, the latter goals exceed the ability of the traditional bag-of-word representation approach, and a richer, more structural representation is required. Accordingly, learning algorithms must be created that can handle the structures observed in texts. In this paper, we propose a Boosting algorithm that captures sub-structures embedded in texts. The proposal consists of i) decision stumps that use subtrees as features and ii) the Boosting algorithm which employs the subtree-based decision stumps as weak learners. We also discuss the relation between our algorithm and SVMs with tree kernel. Two experiments on opinion/modality classification confirm that subtree features are important.

Taku Kudo, Yuji Matsumoto

Real-time Traffic

Algorithm | Boosting Algorithm | Decision Stumps | EMNLP 2004 | EMNLP 2007 |

claim paper

» Boosting strategy for classification

» Email classification with cotraining

» Boosting SVM classifiers by ensemble

» Combining ILP with Semisupervised Learning for Web Page Categorization

Post Info
More Details (n/a)

Added	30 Oct 2010
Updated	30 Oct 2010
Type	Conference
Year	2004
Where	EMNLP
Authors	Taku Kudo, Yuji Matsumoto

Comments (0)

Sciweavers

A Boosting Algorithm for Classification of Semi-Structured Text

Algorithm | Boosting Algorithm | Decision Stumps | EMNLP 2004 | EMNLP 2007 |

Explore & Download

Productivity Tools

Sciweavers