Search Sciweavers | Sciweavers

16 search results - page 3 / 4

» Recognising Informative Web Page Blocks Using Visual Segment...

click to vote

IJDAR
2007

69views more IJDAR 2007»

User-driven page layout analysis of historical printed books

13 years 4 months ago

Download hal.archives-ouvertes.fr

In this paper, based on the study of the specificity of historical printed books, we first explain the main error sources in classical methods used for page layout analysis. We sho...

Jean-Yves Ramel, S. Leriche, M. L. Demonet, S. Bus...

claim paper

Read More »

click to vote

IPM
2006

146views more IPM 2006»

Dictionary-based text categorization of chemical web pages

13 years 4 months ago

Download chemport.ipe.ac.cn

A new dictionary-based text categorization approach is proposed to classify the chemical web pages efficiently. Using a chemistry dictionary, the approach can extract chemistry-re...

Chunyan Liang, Li Guo, Zhaojie Xia, Feng-Guang Nie...

claim paper

Read More »

click to vote

WWW
2005
ACM

135views Internet Technology» more WWW 2005»

Web data extraction based on partial tree alignment

14 years 5 months ago

Download www.cs.uic.edu

This paper studies the problem of extracting data from a Web page that contains several structured data records. The objective is to segment these data records, extract data items...

Yanhong Zhai, Bing Liu

claim paper

Read More »

click to vote

WWW
2006
ACM

112views Internet Technology» more WWW 2006»

Using graph matching techniques to wrap data from PDF documents

14 years 5 months ago

Download rewerse.net

Wrapping is the process of navigating a data source, semiautomatically extracting data and transforming it into a form suitable for data processing applications. There are current...

Tamir Hassan, Robert Baumgartner

claim paper

Read More »

click to vote

CICLING
2009
Springer

335views Natural Language Processing» more CICLING 2009»

Language Identification on the Web: Extending the Dictionary Method

13 years 8 months ago

Download www.fi.muni.cz

Abstract. Automated language identification of written text is a wellestablished research domain that has received considerable attention in the past. By now, efficient and effecti...

Radim Rehurek, Milan Kolkus

claim paper

Read More »

« Prev « First page 3 / 4 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers