We examine linear regression problems where some features may only be observable at a cost (e.g., in medical domains where features may correspond to diagnostic tests that take ti...
Given a spatial data set placed on an n ? n grid, our goal is to find the rectangular regions within which subsets of the data set exhibit anomalous behavior. We develop algorithm...
Mingxi Wu, Xiuyao Song, Chris Jermaine, Sanjay Ran...
There has been much recent interest in adapting data mining algorithms to time series databases. Most of these algorithms need to compare time series. Typically some variation of ...
The k-means algorithm is widely used for clustering, compressing, and summarizing vector data. In this paper, we propose a new acceleration for exact k-means that gives the same a...
We believe that the possibility to use SPARQL as a front end to heterogeneous data without significant cost in performance or expressive power is key to RDF taking its rightful pla...