Top Down Induction of Decision Trees (TDIDT) is the most commonly used method of constructing a model from a dataset in the form of classification rules to classify previously unse...
The similarity join is an important database primitive which has been successfully applied to speed up applications such as similarity search, data analysis and data mining. The s...
Parallel dataflow programming frameworks such as Map-Reduce are increasingly being used for large scale data analysis on computing clouds. It is therefore becoming important to a...
Relational graphs are widely used in modeling large scale networks such as biological networks and social networks. In this kind of graph, connectivity becomes critical in identif...
The Dryad and DryadLINQ systems offer a new programming model for large scale data-parallel computing. They generalize previous execution environments such as SQL and MapReduce in...