Sciweavers

ACL
2008

Semi-Supervised Sequential Labeling and Segmentation Using Giga-Word Scale Unlabeled Data

13 years 6 months ago
Semi-Supervised Sequential Labeling and Segmentation Using Giga-Word Scale Unlabeled Data
This paper provides evidence that the use of more unlabeled data in semi-supervised learning can improve the performance of Natural Language Processing (NLP) tasks, such as part-of-speech tagging, syntactic chunking, and named entity recognition. We first propose a simple yet powerful semi-supervised discriminative model appropriate for handling large scale unlabeled data. Then, we describe experiments performed on widely used test collections, namely, PTB III data, CoNLL'00 and '03 shared task data for the above three NLP tasks, respectively. We incorporate up to 1G-words (one billion tokens) of unlabeled data, which is the largest amount of unlabeled data ever used for these tasks, to investigate the performance improvement. In addition, our results are superior to the best reported results for all of the above test collections.
Jun Suzuki, Hideki Isozaki
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where ACL
Authors Jun Suzuki, Hideki Isozaki
Comments (0)