This paper considers the problem of identifying on the Web compound documents (cDocs) ? groups of web pages that in aggregate constitute semantically coherent information entities...
The advent of e-commerce has created a trend that brought thousands of catalogs online. Most of these websites are “taxonomy-directed”. A Web site is said to be ``taxonomydire...
In this paper we are interested in describing Web pages by how users interact within their contents. Thus, an alternate but complementary way of labelling and classifying Web docu...
We give a survey over the INEX initiative, which focuses on the evaluation of content -based access to XML documents. First, we describe the test setting and the various tracks of...
In recent years, there has been a prevalence of search engines being employed to find useful information in the Web as they efficiently explore hyperlinks between web pages which ...
Zhenglu Yang, Lin Li, Botao Wang, Masaru Kitsurega...