In recent years, anonymization methods have emerged as an important tool to preserve individual privacy when releasing privacy sensitive data sets. This interest in anonymization t...
Clustering is one of the most important tasks performed in Data Mining applications. This paper presents an e cient SQL implementation of the EM algorithm to perform clustering in...
Consider spatial data consisting of a set of binary features taking values over a collection of spatial extents (grid cells). We propose a method that simultaneously finds spatia...
DryadLINQ is a system and a set of language extensions that enable a new programming model for large scale distributed computing. It generalizes previous execution environments su...
Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Bud...
In this paper, we describe a query system that provides visual relevance feedback in querying large databases. Our goal is to support the process of data mining by representing as...