This paper considers the problem of identifying on the Web compound documents (cDocs) ? groups of web pages that in aggregate constitute semantically coherent information entities...
The challenge of automatically summarising Web pages and sites is a great one. However, currently there is no solution which offers an easy way to produce unbiased, coherent , and...
In this paper we propose a scaling-up method that is applicable to essentially any induction algorithm based on discrete search. The result of applying the method to an algorithm ...
Different from traditional information retrieval, both content and structure are critical to the success of Web information retrieval. In recent years, many relevance propagation ...
Tao Qin, Tie-Yan Liu, Xu-Dong Zhang, Zheng Chen, W...
Web spam is behavior that attempts to deceive search engine ranking algorithms. TrustRank is a recent algorithm that can combat web spam. However, TrustRank is vulnerable in the s...