Sciweavers

85 search results - page 5 / 17
» Extracting unstructured data from template generated web doc...
Sort
View
ICDE
2006
IEEE
207views Database» more  ICDE 2006»
15 years 11 months ago
Automatic Sales Lead Generation from Web Data
Speed to market is critical to companies that are driven by sales in a competitive market. The earlier a potential customer can be approached in the decision making process of a p...
Ganesh Ramakrishnan, Sachindra Joshi, Sumit Negi, ...
SIGMOD
2000
ACM
236views Database» more  SIGMOD 2000»
15 years 1 months ago
XTRACT: A System for Extracting Document Type Descriptors from XML Documents
XML is rapidly emerging as the new standard for data representation and exchange on the Web. An XML document can be accompanied by a Document Type Descriptor (DTD) which plays the...
Minos N. Garofalakis, Aristides Gionis, Rajeev Ras...
110
Voted
KDD
2007
ACM
193views Data Mining» more  KDD 2007»
15 years 10 months ago
Joint optimization of wrapper generation and template detection
Many websites have large collections of pages generated dynamically from an underlying structured source like a database. The data of a category are typically encoded into similar...
Shuyi Zheng, Ruihua Song, Ji-Rong Wen, Di Wu
ACMICEC
2006
ACM
141views ECommerce» more  ACMICEC 2006»
15 years 3 months ago
From HTML documents to web tables and rules
We present a browser-extending Semantic Web extraction system that maps HTML documents to tables and, where possible, to rules. First, the basic data extractor ViPER distills and ...
Kai Simon, Georg Lausen, Harold Boley
WWW
2006
ACM
15 years 10 months ago
Using graph matching techniques to wrap data from PDF documents
Wrapping is the process of navigating a data source, semiautomatically extracting data and transforming it into a form suitable for data processing applications. There are current...
Tamir Hassan, Robert Baumgartner