The Dryad and DryadLINQ systems offer a new programming model for large scale data-parallel computing. They generalize previous execution environments such as SQL and MapReduce in...
The DataLinks technology developed at IBM Almaden Research Center and now available in DB2 UDB 5.2 introduces a new data type called DATALINK for a database to reference and manag...
Background: We propose a sequence clustering algorithm and compare the partition quality and execution time of the proposed algorithm with those of a popular existing algorithm. T...
David J. Russell, Samuel F. Way, Andrew K. Benson,...
As large-scale databases become commonplace, there has been signi cant interest in mining them for commercial purposes. One of the basic tasks that underlies many of these mining ...
The similarity join is an important operation for mining high-dimensional feature spaces. Given two data sets, the similarity join computes all tuples (x, y) that are within a dis...