Reference reconciliation is the problem of identifying when different references (i.e., sets of attribute values) in a dataset correspond to the same real-world entity. Most previ...
Abstract. Clustering is a problem of great practical importance in numerous applications. The problem of clustering becomes more challenging when the data is categorical, that is, ...
In 1987, Gray and Putzolo presented the five-minute rule, which was reviewed and renewed ten years later in 1997. With the advent of flash memory in the gap between traditional RA...
Information systems often require combining datasets available in different formats, and geographical information systems are no exception. While semantic technologies have been u...
We study model selection strategies based on penalized empirical loss minimization. We point out a tight relationship between error estimation and data-based complexity penalizatio...