Sciweavers

INEX
2005
Springer

A Flexible Structured-Based Representation for XML Document Mining

13 years 10 months ago
A Flexible Structured-Based Representation for XML Document Mining
This paper reports on the INRIA group’s approach to XML mining while participating in the INEX XML Mining track 2005. We use a flexible representation of XML documents that allows taking into account the structure only or both the structure and content. Our approach consists of representing XML documents by a set of their subpaths, defined according to some criteria (length, root beginning, leaf ending). By considering those sub-paths as words, we can use standard methods for vocabulary reduction, and simple clustering methods such as k-means. We use an implementation of the clustering algorithm known as dynamic clouds that can work with distinct groups of independent modalities put in separate variables. This is useful in our model since embedded sub-paths are not independent: we split potentially dependant paths into separate variables, resulting in each of them containing independant paths. Experiments with the INEX collections show good results for the structure-only collection...
Anne-Marie Vercoustre, Mounir Fegas, Saba Gul, Yve
Added 27 Jun 2010
Updated 27 Jun 2010
Type Conference
Year 2005
Where INEX
Authors Anne-Marie Vercoustre, Mounir Fegas, Saba Gul, Yves Lechevallier
Comments (0)