Sciweavers

PODS
2004
ACM

The Lixto Data Extraction Project - Back and Forth between Theory and Practice

14 years 4 months ago
The Lixto Data Extraction Project - Back and Forth between Theory and Practice
We present the Lixto project, which is both a research project in database theory and a commercial enterprise that develops Web data extraction (wrapping) and Web service definition software. We discuss the project's main motivations and ideas, in particular the use of a logic-based framework for wrapping. Then we present theoretical results on monadic datalog over trees and on Elog, its close relative which is used as the internal wrapper language in the Lixto system. These results include both a characterization of the expressive power and the complexity of these languages. We describe the visual wrapper specification process in Lixto and various practical aspects of wrapping. We discuss work on the complexity of query languages for trees that was inseminated by our theoretical study of logic-based languages for wrapping. Then we return to the practice of wrapping and the Lixto Transformation Server, which allows for streaming integration of data extracted from Web pages. This ...
Georg Gottlob, Christoph Koch, Robert Baumgartner,
Added 08 Dec 2009
Updated 08 Dec 2009
Type Conference
Year 2004
Where PODS
Authors Georg Gottlob, Christoph Koch, Robert Baumgartner, Marcus Herzog, Sergio Flesca
Comments (0)