Discovering patterns in a sequence is an important aspect of data mining. One popular choice of such patterns are episodes, patterns in sequential data describing events that often...
We consider a new data mining problem of detecting the members of a rare class of data, the needles, that have been hidden in a set of records, the haystack. Besides the haystack, ...
A new approach to the induction of multivariate decision trees is proposed. A linear decision function (hyper-plane) is used at each non-terminal node of a binary tree for splittin...
Stability is an important yet under-addressed issue in feature selection from high-dimensional and small sample data. In this paper, we show that stability of feature selection ha...
In this paper, we propose GAD (General Activity Detection) for fast clustering on large scale data. Within this framework we design a set of algorithms for different scenarios: (...
Jiawei Han, Liangliang Cao, Sangkyum Kim, Xin Jin,...