Sciweavers

WWW
2009
ACM

A densitometric analysis of web template content

14 years 5 months ago
A densitometric analysis of web template content
What makes template content in the Web so special that we need to remove it? In this paper I present a large-scale aggregate analysis of textual Web content, corroborating statistical laws from the field of Quantitative Linguistics. I analyze the idiosyncrasy of template content compared to regular "full text" content and derive a simple yet suitable quantitative model. Categories and Subject Descriptors H.3.3 [Information Systems]: Information Search and Retrieval; G.3 [Probability and Statistics]: Distribution Functions General Terms Theory, Experimentation, Measurement Keywords Content Analysis, Template Detection, Template Removal, Web Page Segmentation, Noise Removal
Christian Kohlschütter
Added 21 Nov 2009
Updated 21 Nov 2009
Type Conference
Year 2009
Where WWW
Authors Christian Kohlschütter
Comments (0)