Sciweavers

CIKM
2011
Springer

The quality of the XML web

12 years 4 months ago
The quality of the XML web
We collect evidence to answer the following question: Is the quality of the XML documents found on the web sufficient to apply XML technology like XQuery, XPath and XSLT? XML collections from the web have been previously studied statistically, but no detailed information about the quality of the XML documents on the web is available to date. We address this shortcoming in this study. We gathered 180K XML documents from the web. Their quality is surprisingly good; 85.4% is wellformed and 99.5% of all specified encodings is correct. Validity needs serious attention. Only 25% of all files contain a reference to a DTD or XSD, of which just one third is actually valid. Errors are studied in detail. Automatic error repair seems promising. Our study is well documented and easily repeatable. This paves the way for a periodic quality assessment of the XML web. The full paper and all data are publicly available at the url http://data.politicalmashup.nl/xmlweb. Categories and Subject Descriptors...
Steven Grijzenhout, Maarten Marx
Added 13 Dec 2011
Updated 13 Dec 2011
Type Journal
Year 2011
Where CIKM
Authors Steven Grijzenhout, Maarten Marx
Comments (0)