We consider the problem of joining massive datasets. We propose two techniques for minimizing disk I/O cost of join operations for both spatial and sequence data. Our techniques o...
The similarity join is an important database primitive which has been successfully applied to speed up applications such as similarity search, data analysis and data mining. The s...
Many emerging data mining applications require a similarity join between points in a high-dimensional domain. We present a new algorithm that utilizes a new index structure, calle...
Microarray datasets are often too large to visualise due to the high dimensionality. The self-organising map has been found useful to analyse massive complex datasets. It can be us...
In this paper, we devise an efficient algorithm for clustering market-basket data. Different from those of the traditional data, the features of market-basket data are known to b...