Most data integration applications require a matching between the schemas of the respective data sets. We show how the existence of duplicates within these data sets can be exploi...
Poor quality data is prevalent in databases due to a variety of reasons, including transcription errors, lack of standards for recording database fields, etc. To be able to query ...
Byung-Won On, Nick Koudas, Dongwon Lee, Divesh Sri...
Deduplication is a key operation in integrating data from multiple sources. The main challenge in this task is designing a function that can resolve when a pair of records refer t...
We propose a novel extraction approach that exploits content redundancy on the web to extract structured data from template-based web sites. We start by populating a seed database...
Pankaj Gulhane, Rajeev Rastogi, Srinivasan H. Seng...
An ecological-cognitive framework of analysis and a model-tracing architecture are presented and used in the analysis of data recorded from users browsing a large document collect...