An important requirement for emerging applications which aim to locate and integrate content distributed over the Web is to identify pages that are relevant for a given domain or ...
We describe an approach for constructing search spaces that consist of highly relevant web pages using similarities between the contents of linked web pages to represent their lin...
Aki Kobayashi, Kuangmin Tan, Katsunori Yamaoka, Yo...
Nowadays web spamming has emerged to take the economic advantage of high search rankings and threatened the accuracy and fairness of those rankings. Understanding spamming techniq...
This paper presents experiments on classifying web pages by genre. Firstly, a corpus of 1539 manually labeled web pages was prepared. Secondly, 502 genre features were selected ba...
The web graph follows the power law distribution and has a hierarchy structure. But neither the PageRank algorithm nor any of its improvements leverage these attributes. In this p...
Yizhou Lu, Benyu Zhang, Wensi Xi, Zheng Chen, Yi L...