Fast Methods for Kernel-Based Text Analysis

15 years 7 months ago

Download acl.ldc.upenn.edu

Kernel-based learning (e.g., Support Vector Machines) has been successfully applied to many hard problems in Natural Language Processing (NLP). In NLP, although feature combinations are crucial to improving performance, they are heuristically selected. Kernel methods change this situation. The merit of the kernel methods is that effective feature combination is implicitly expanded without loss of generality and increasing the computational costs. Kernel-based text analysis shows an excellent performance in terms in accuracy; however, these methods are usually too slow to apply to large-scale text analysis. In this paper, we extend a Basket Mining algorithm to convert a kernel-based classiﬁer into a simple and fast linear classiﬁer. Experimental results on English BaseNP Chunking, Japanese Word Segmentation and Japanese Dependency Parsing show that our new classiﬁers are about 30 to 300 times faster than the standard kernel-based classiﬁers.

Taku Kudo, Yuji Matsumoto

Real-time Traffic