Sciweavers

21 search results - page 3 / 5
» Title extraction from bodies of HTML documents and its appli...
Sort
View
CIKM
2008
Springer
13 years 7 months ago
A densitometric approach to web page segmentation
Web Page segmentation is a crucial step for many applications in Information Retrieval, such as text classification, de-duplication and full-text search. In this paper we describe...
Christian Kohlschütter, Wolfgang Nejdl
CICLING
2009
Springer
14 years 6 months ago
Business Specific Online Information Extraction from German Websites
This paper presents a system that uses the domain name of a German business website to locate its information pages (e.g. company profile, contact page, imprint) and then identifi...
Yeong Su Lee, Michaela Geierhos
COOPIS
1999
IEEE
13 years 9 months ago
Looking at the Web through XML Glasses
The Web so far has been incredibly successful at delivering information to human users. So successful actually, that there is now an urgent need to go beyond a browsing human and ...
Arnaud Sahuguet, Fabien Azavant
DOCENG
2009
ACM
14 years 1 days ago
Object-level document analysis of PDF files
The PDF format is commonly used for the exchange of documents on the Web and there is a growing need to understand and extract or repurpose data held in PDF documents. Many system...
Tamir Hassan
PVLDB
2008
141views more  PVLDB 2008»
13 years 5 months ago
WebTables: exploring the power of tables on the web
The World-Wide Web consists of a huge number of unstructured documents, but it also contains structured data in the form of HTML tables. We extracted 14.1 billion HTML tables from...
Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wa...