Link farm spam and replicated pages can greatly deteriorate link-based ranking algorithms like HITS. In order to identify and neutralize link farm spam and replicated pages, we lo...
Some previous works show that a web page can be partitioned to multiple segments or blocks, and usually the importance of those blocks in a page is not equivalent. Also, it is pro...
Ruihua Song, Haifeng Liu, Ji-Rong Wen, Wei-Ying Ma
Recently, semantic text portion (STP) is getting popular in the field of Web mining. STP is a text portion in the original page which is semantically related to the anchor pointing...
We consider the problem of clustering Web image search results. Generally, the image search results returned by an image search engine contain multiple topics. Organizing the resu...
Deng Cai, Xiaofei He, Zhiwei Li, Wei-Ying Ma, Ji-R...
We present a highly accurate method for classifying web pages based on link percentage, which is the percentage of text characters that are parts of links normalized by the number...