Search Sciweavers | Sciweavers

21 search results - page 3 / 5

» Title extraction from bodies of HTML documents and its appli...

click to vote

CIKM
2008
Springer

156views Information Technology» more CIKM 2008»

A densitometric approach to web page segmentation

13 years 7 months ago

Download www.l3s.de

Web Page segmentation is a crucial step for many applications in Information Retrieval, such as text classification, de-duplication and full-text search. In this paper we describe...

Christian Kohlschütter, Wolfgang Nejdl

claim paper

Read More »

click to vote

CICLING
2009
Springer

140views Natural Language Processing» more CICLING 2009»

Business Specific Online Information Extraction from German Websites

14 years 6 months ago

Download www.cis.uni-muenchen.de

This paper presents a system that uses the domain name of a German business website to locate its information pages (e.g. company profile, contact page, imprint) and then identifi...

Yeong Su Lee, Michaela Geierhos

claim paper

Read More »

click to vote

COOPIS
1999
IEEE

107views Information Technology» more COOPIS 1999»

Looking at the Web through XML Glasses

13 years 9 months ago

Download db.cis.upenn.edu

The Web so far has been incredibly successful at delivering information to human users. So successful actually, that there is now an urgent need to go beyond a browsing human and ...

Arnaud Sahuguet, Fabien Azavant

claim paper

Read More »

click to vote

DOCENG
2009
ACM

166views Document Analysis» more DOCENG 2009»

Object-level document analysis of PDF files

14 years 1 days ago

Download www.dbai.tuwien.ac.at

The PDF format is commonly used for the exchange of documents on the Web and there is a growing need to understand and extract or repurpose data held in PDF documents. Many system...

Tamir Hassan

claim paper

Read More »

click to vote

PVLDB
2008

141views more PVLDB 2008»

WebTables: exploring the power of tables on the web

13 years 5 months ago

Download turing.cs.washington.edu

The World-Wide Web consists of a huge number of unstructured documents, but it also contains structured data in the form of HTML tables. We extracted 14.1 billion HTML tables from...

Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wa...

claim paper

Read More »

« Prev « First page 3 / 5 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers