Sciweavers

16 search results - page 3 / 4
» Recognising Informative Web Page Blocks Using Visual Segment...
Sort
View
IJDAR
2007
69views more  IJDAR 2007»
13 years 4 months ago
User-driven page layout analysis of historical printed books
In this paper, based on the study of the specificity of historical printed books, we first explain the main error sources in classical methods used for page layout analysis. We sho...
Jean-Yves Ramel, S. Leriche, M. L. Demonet, S. Bus...
IPM
2006
146views more  IPM 2006»
13 years 4 months ago
Dictionary-based text categorization of chemical web pages
A new dictionary-based text categorization approach is proposed to classify the chemical web pages efficiently. Using a chemistry dictionary, the approach can extract chemistry-re...
Chunyan Liang, Li Guo, Zhaojie Xia, Feng-Guang Nie...
WWW
2005
ACM
14 years 5 months ago
Web data extraction based on partial tree alignment
This paper studies the problem of extracting data from a Web page that contains several structured data records. The objective is to segment these data records, extract data items...
Yanhong Zhai, Bing Liu
WWW
2006
ACM
14 years 5 months ago
Using graph matching techniques to wrap data from PDF documents
Wrapping is the process of navigating a data source, semiautomatically extracting data and transforming it into a form suitable for data processing applications. There are current...
Tamir Hassan, Robert Baumgartner
CICLING
2009
Springer
13 years 8 months ago
Language Identification on the Web: Extending the Dictionary Method
Abstract. Automated language identification of written text is a wellestablished research domain that has received considerable attention in the past. By now, efficient and effecti...
Radim Rehurek, Milan Kolkus