Text Classification using String Kernels

15 years 4 months ago

Download jmlr.csail.mit.edu

We propose a novel approach for categorizing text documents based on the use of a special kernel. The kernel is an inner product in the feature space generated by all subsequences of length k. A subsequence is any ordered sequence of k characters occurring in the text though not necessarily contiguously. The subsequences are weighted by an exponentially decaying factor of their full length in the text, hence emphasising those occurrences that are close to contiguous. A direct computation of this feature vector would involve a prohibitive amount of computation even for modest values of k, since the dimension of the feature space grows exponentially with k. The paper describes how despite this fact the inner product can be efficiently evaluated by a dynamic programming technique. Experimental comparisons of the performance of the kernel compared with a standard word feature space kernel (Joachims, 1998) show positive results on modestly sized datasets. The case of contiguous subsequence...

Huma Lodhi, John Shawe-Taylor, Nello Cristianini,

Real-time Traffic

Feature Space | Feature Space Kernel | Inner Product | NIPS 2000 | NIPS 2007 |

claim paper

» Distribution kernels based on moments of counts

» Video Event Classification Using Bag of Words and String Kernels

» Video event classification using string kernels

» An LZ78 Based String Kernel

» Inverted Index based Modified Version of KNN for Text Categorization

» Effectiveness of Methods for Syntactic and Semantic Recognition of Numeral Strings Tradeof...

» Text Classification Using Tree Kernels and Linguistic Information

» Edit distancebased kernel functions for structural pattern classification

Post Info
More Details (n/a)

Added	01 Nov 2010
Updated	01 Nov 2010
Type	Conference
Year	2000
Where	NIPS
Authors	Huma Lodhi, John Shawe-Taylor, Nello Cristianini, Christopher J. C. H. Watkins

Comments (0)

Sciweavers

Text Classification using String Kernels

Feature Space | Feature Space Kernel | Inner Product | NIPS 2000 | NIPS 2007 |

Explore & Download

Productivity Tools

Sciweavers