At a fundamental level, the key challenge in data integration is to reconcile the semantics of disparate data sets, each expressed with a different database structure. I argue th...
With the rise of XML, the database community has been challenged by semi-structured data processing. Since the data type behind XML is the tree, state-of-the-art RDBMSs have learn...
Abstract: Data extraction is a necessary technology to deal with the huge and growing collection of unstructured and semistructured information available on the World Wide Web. Ont...
Stephen W. Liddle, Kimball A. Hewett, David W. Emb...
Similarity joins in databases can be used for several important tasks such as data cleaning and instance-based data integration. In this paper, we explore ways how to support such ...
When web servers publish data formatted in XML, only the current state of the data is (generally) published. But data evolves over time as it is updated. Capturing that evolution i...
Curtis E. Dyreson, Richard T. Snodgrass, Faiz Curr...