Sciweavers

JUCS
2008

Recognising Informative Web Page Blocks Using Visual Segmentation for Efficient Information Extraction

13 years 4 months ago
Recognising Informative Web Page Blocks Using Visual Segmentation for Efficient Information Extraction
Abstract: As web sites are getting more complicated, the construction of web information extraction systems becomes more troublesome and time-consuming. A common theme is the difficulty in locating the segments of a page in which the target information is contained, which we call the informative blocks. This article reports on the Recognising Informative Page Blocks algorithm (RIPB), which is able to identify the informative block in a web page so that information extraction algorithms can work on it more efficiently. RIPB relies on an existing algorithm for vision-based page block segmentation to analyse and partition a web page into a set of visual blocks, and then groups related blocks with similar content structures into block clusters by using a tree edit distance method. RIPB recognises the informative block cluster by using tree alignment and tree matching. A series of experiments were performed, and the conclusions were that RIPB was more than 95% accurate in recognising inform...
Jinbeom Kang, Joongmin Choi
Added 13 Dec 2010
Updated 13 Dec 2010
Type Journal
Year 2008
Where JUCS
Authors Jinbeom Kang, Joongmin Choi
Comments (0)