In recent years, several information retrieval methods using information about the Web-links are developed, such as HITS and Trawling. In order to analyze the Web-links dividing i...
With the growth of the Web, there has been a rapid increase in the number of users who need to access online databases without having a detailed knowledge of the schema or of quer...
Defining the boundaries of a web-site, for (say) archiving or information retrieval purposes, is an important but complicated task. In this paper a web-page clustering approach to...
Web forums have become an important data resource for many web applications, but extracting structured data from unstructured web forum pages is still a challenging task due to bo...
Jiang-Ming Yang, Rui Cai, Yida Wang, Jun Zhu, Lei ...
This paper considers the problem of identifying on the Web compound documents (cDocs) ? groups of web pages that in aggregate constitute semantically coherent information entities...