Many important problems involve clustering large datasets. Although naive implementations of clustering are computationally expensive, there are established efficient techniques f...
A lot of recent research has focused on the content-based dissemination of XML data. However, due to the heterogeneous data schemas used by different data publishers even for data...
Like HTML, many XML documents are resident on native file systems. Since XML data is irregular and verbose, the disk space and the network bandwidth are wasted. To overcome the ve...
In recent years, the management and processing of so-called data streams has become a topic of active research in several fields of computer science such as, e.g., distributed sys...
Data quality is a critical problem in modern databases. Data entry forms present the first and arguably best opportunity for detecting and mitigating errors, but there has been li...
Kuang Chen, Harr Chen, Neil Conway, Joseph M. Hell...