In this paper we present a method for automatically segmenting unformatted text records into structured elements. Several useful data sources today are human-generated as continuo...
Vinayak R. Borkar, Kaustubh Deshmukh, Sunita Saraw...
We present a formal framework for capturing the provenance of data appearing in XQuery views of XML. Building on previous work on relations and their (positive) query languages, w...
Data quality is a critical problem in modern databases. Data entry forms present the first and arguably best opportunity for detecting and mitigating errors, but there has been li...
Kuang Chen, Harr Chen, Neil Conway, Joseph M. Hell...
This paper introduces the problem of modeling urban transportation systems in a database where certain aspects of the data are probabilistic in nature. The transportation network ...
Joel Booth, A. Prasad Sistla, Ouri Wolfson, Isabel...
Subspace clustering is an extension of traditional clustering that seeks to find clusters in different subspaces within a dataset. This is a particularly important challenge with...