Sciweavers

EMNLP
2010

A Semi-Supervised Approach to Improve Classification of Infrequent Discourse Relations Using Feature Vector Extension

13 years 2 months ago
A Semi-Supervised Approach to Improve Classification of Infrequent Discourse Relations Using Feature Vector Extension
Several recent discourse parsers have employed fully-supervised machine learning approaches. These methods require human annotators to beforehand create an extensive training corpus, which is a time-consuming and costly process. On the other hand, unlabeled data is abundant and cheap to collect. In this paper, we propose a novel semi-supervised method for discourse relation classification based on the analysis of cooccurring features in unlabeled data, which is then taken into account for extending the feature vectors given to a classifier. Our experimental results on the RST Discourse Treebank corpus and Penn Discourse Treebank indicate that the proposed method brings a significant improvement in classification accuracy and macro-average F-score when small training datasets are used. For instance, with training sets of c.a. 1000 labeled instances, the proposed method brings improvements in accuracy and macro-average F-score up to 50% compared to a baseline classifier. We believe that...
Hugo Hernault, Danushka Bollegala, Mitsuru Ishizuk
Added 11 Feb 2011
Updated 11 Feb 2011
Type Journal
Year 2010
Where EMNLP
Authors Hugo Hernault, Danushka Bollegala, Mitsuru Ishizuka
Comments (0)