Sciweavers

CLEANDB
2006
ACM
142views Database» more  CLEANDB 2006»
13 years 6 months ago
QUEST: QUery-driven Exploration of Semistructured Data with ConflicTs and Partial Knowledge
An important reality when integrating scientific data is the fact that data may often be "missing", partially specified, or conflicting. Therefore, in this paper, we pre...
Yan Qi 0002, K. Selçuk Candan, Maria Luisa ...
CLEANDB
2006
ACM
128views Database» more  CLEANDB 2006»
13 years 8 months ago
Structure Aware XML Object Identification
Diego Milano, Monica Scannapieco, Tiziana Catarci
CLEANDB
2006
ACM
112views Database» more  CLEANDB 2006»
13 years 8 months ago
Generic Entity Resolution with Data Confidences
We consider the Entity Resolution (ER) problem (also known as deduplication, or merge-purge), in which records determined to represent the same real-world entity are successively ...
David Menestrina, Omar Benjelloun, Hector Garcia-M...
CLEANDB
2006
ACM
312views Database» more  CLEANDB 2006»
13 years 8 months ago
Efficiently Filtering RFID Data Streams
RFID holds the promise of real-time identifying, locating, tracking and monitoring physical objects without line of sight, and can be used for a wide range of pervasive computing ...
Yijian Bai, Fusheng Wang, Peiya Liu
CLEANDB
2006
ACM
185views Database» more  CLEANDB 2006»
13 years 10 months ago
In-network Outlier Cleaning for Data Collection in Sensor Networks
Outliers are very common in the environmental data monitored by a sensor network consisting of many inexpensive, low fidelity, and frequently failed sensors. The limited battery ...
Yongzhen Zhuang, Lei Chen 0002
CLEANDB
2006
ACM
145views Database» more  CLEANDB 2006»
13 years 10 months ago
Cleansing Databases of Misspelled Proper Nouns
The paper presents a data cleansing technique for string databases. We propose and evaluate an algorithm that identifies a group of strings that consists of (multiple) occurrence...
Arturas Mazeika, Michael H. Böhlen
CLEANDB
2006
ACM
163views Database» more  CLEANDB 2006»
13 years 10 months ago
Circumventing Data Quality Problems Using Multiple Join Paths
We propose the Multiple Join Path (MJP) framework for obtaining high quality information by linking fields across multiple databases, when the underlying databases have poor qual...
Yannis Kotidis, Amélie Marian, Divesh Sriva...
CLEANDB
2006
ACM
113views Database» more  CLEANDB 2006»
13 years 10 months ago
Column Heterogeneity as a Measure of Data Quality
Data quality is a serious concern in every data management application, and a variety of quality measures have been proposed, including accuracy, freshness and completeness, to ca...
Bing Tian Dai, Nick Koudas, Beng Chin Ooi, Divesh ...