Sciweavers

WWW
2008
ACM

Web page sectioning using regex-based template

14 years 5 months ago
Web page sectioning using regex-based template
This work aims to provide a novel, site-specific web page segmentation and section importance detection algorithm, which leverages structural, content, and visual information. The structural and content information is leveraged via template, a generalized regular expression learnt over set of pages. The template along with visual information results into high sectioning accuracy. The experimental results demonstrate the effectiveness of the approach. Categories and Subject Descriptors: H.3.3 [Information Storage, Retrieval]: Information Extraction General Terms: Algorithms, Design
Rupesh R. Mehta, Amit Madaan
Added 21 Nov 2009
Updated 21 Nov 2009
Type Conference
Year 2008
Where WWW
Authors Rupesh R. Mehta, Amit Madaan
Comments (0)