Sciweavers

ICONIP
2007

Classification of Documents Based on the Structure of Their DOM Trees

13 years 6 months ago
Classification of Documents Based on the Structure of Their DOM Trees
In this paper, we discuss kernels that can be applied for the classification of XML documents based on their DOM trees. DOM trees are ordered trees in which every node might be labeled by a vector of attributes including its XML tag and the textual content. We describe five new kernels suitable for such structures: a kernel based on predefined structural features, a tree kernel derived from the well-known parse tree kernel, the set tree kernel that allows permutations of children, the string tree kernel being an extension of the so-called partial tree kernel, and the soft tree kernel. We evaluate the kernels experimentally on a corpus containing the DOM trees of newspaper articles and on the well-known SUSANNE corpus.
Peter Geibel, Olga Pustylnikov, Alexander Mehler,
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2007
Where ICONIP
Authors Peter Geibel, Olga Pustylnikov, Alexander Mehler, Helmar Gust, Kai-Uwe Kühnberger
Comments (0)