StatiX: making XML count

11 years 6 months ago
StatiX: making XML count
The availability of summary data for XML documents has many applications, from providing users with quick feedback about their queries, to cost-based storage design and query optimization. StatiX is a novel XML Schema-aware statistics framework that exploits the structure derived by regular expressions (which define elements in an XML Schema) to pinpoint places in the schema that are likely sources of structural skew. As we discuss below, this information can be used to build concise, yet accurate, statistical summaries for XML data. StatiX leverages standard XML technology for gathering statistics, notably XML Schema validators, and it uses histograms to summarize both the structure and values in an XML document. In this paper we describe the StatiX system. We develop algorithms that decompose schemas to obtain statistics at different granularities and discuss how statistics can be gathered as documents are validated. We also present an experimental evaluation which demonstrates the ...
Juliana Freire, Jayant R. Haritsa, Maya Ramanath,
Added 08 Dec 2009
Updated 08 Dec 2009
Type Conference
Year 2002
Authors Juliana Freire, Jayant R. Haritsa, Maya Ramanath, Prasan Roy, Jérôme Siméon
Comments (0)