Web-Site Boundary Detection

11 years 1 months ago
Web-Site Boundary Detection
Defining the boundaries of a web-site, for (say) archiving or information retrieval purposes, is an important but complicated task. In this paper a web-page clustering approach to boundary detection is suggested. The principal issue is feature selection, hampered by the observation that there is no clear understanding of what a web-site is. This paper proposes a definition of a web-site, founded on the principle of user intention, directed at the boundary detection problem; and then reports on a sequence of experiments, using a number of clustering techniques, and a wide range of features and combinations of features to identify web-site boundaries. The preliminary results reported seem to indicate that, in general, a combination of features produces the most appropriate result.
Ayesh Alshukri, Frans Coenen, Michele Zito
Added 12 Oct 2010
Updated 12 Oct 2010
Type Conference
Year 2010
Authors Ayesh Alshukri, Frans Coenen, Michele Zito
Comments (0)