Sciweavers

DOCENG
2009
ACM

Web article extraction for web printing: a DOM+visual based approach

13 years 10 months ago
Web article extraction for web printing: a DOM+visual based approach
: © Web Article Extraction for Web Printing: a DOM+Visual based Approach Ping Luo, Jian Fan, Sam Liu, Fen Lin, Yuhong Xiong, Jerry; Liu HP Laboratories HPL-2009-185 Article extraction, maximal scoring subsequence This work studies the problem of extracting articles from Web pages for better printing. Different from existing approaches of article extraction, Web printing poses several unique requirements: 1) Identifying just the boundary surrounding the text-body is not the ideal solution for article extraction. It is highly desirable to filter out some uninformative links and advertisements within this boundary. 2) It is necessary to identify paragraphs, which may not be readily separated as DOM nodes, for the purpose of better layout of the article. 3) Its performance should be independent of content domains, written languages, and Web page templates. Toward these goals we propose a novel method of article extraction using both DOM (Document Object Model) and visual features. The mai...
Ping Luo, Jian Fan, Sam Liu, Fen Lin, Yuhong Xiong
Added 28 May 2010
Updated 28 May 2010
Type Conference
Year 2009
Where DOCENG
Authors Ping Luo, Jian Fan, Sam Liu, Fen Lin, Yuhong Xiong, Jerry Liu
Comments (0)