Kernels for Structured Natural Language Data

13 years 7 months ago
Kernels for Structured Natural Language Data
This paper devises a novel kernel function for structured natural language data. In the field of Natural Language Processing, feature extraction consists of the following two steps: (1) syntactically and semantically analyzing raw data, i.e., character strings, then representing the results as discrete structures, such as parse trees and dependency graphs with part-of-speech tags; (2) creating (possibly high-dimensional) numerical feature vectors from the discrete structures. The new kernels, called Hierarchical Directed Acyclic Graph (HDAG) kernels, directly accept DAGs whose nodes can contain DAGs. HDAG data structures are needed to fully reflect the syntactic and semantic structures that natural language data inherently have. In this paper, we define the kernel function and show how it permits efficient calculation. Experiments demonstrate that the proposed kernels are superior to existing kernel functions, e.g., sequence kernels, tree kernels, and bag-of-words kernels.
Jun Suzuki, Yutaka Sasaki, Eisaku Maeda
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2003
Where NIPS
Authors Jun Suzuki, Yutaka Sasaki, Eisaku Maeda
Comments (0)