Block-based web search

10 years 3 months ago
Block-based web search
Multiple-topic and varying-length of web pages are two negative factors significantly affecting the performance of web search. In this paper, we explore the use of page segmentation algorithms to partition web pages into blocks and investigate how to take advantage of block-level evidence to improve retrieval performance in the web context. Because of the special characteristics of web pages, different page segmentation method will have different impact on web search performance. We compare four types of methods, including fixed-length page segmentation, DOM-based page segmentation, vision-based page segmentation, and a combined method which integrates both semantic and fixed-length properties. Experiments on block-level query expansion and retrieval are performed. Among the four approaches, the combined method achieves the best performance for web search. Our experimental results also show that such a semantic partitioning of web pages effectively deals with the problem of multiple d...
Deng Cai, Shipeng Yu, Ji-Rong Wen, Wei-Ying Ma
Added 30 Jun 2010
Updated 30 Jun 2010
Type Conference
Year 2004
Authors Deng Cai, Shipeng Yu, Ji-Rong Wen, Wei-Ying Ma
Comments (0)