Text classification poses some specific challenges. One such challenge is its high dimensionality where each document (data point) contains only a small subset of them. In this pap...
We study the problem of estimating the Earth Mover’s Distance (EMD) between probability distributions when given access only to samples of the distributions. We give closeness t...
Khanh Do Ba, Huy L. Nguyen, Huy N. Nguyen, Ronitt ...
High-performance computing clusters (commodity hardware with low-latency, high-bandwidth interconnects) based on Linux, are rapidly becoming the dominant computing platform for a ...
The Denclue algorithm employs a cluster model based on kernel density estimation. A cluster is defined by a local maximum of the estimated density function. Data points are assign...
We study an algorithm for feature selection that clusters attributes using a special metric and then makes use of the dendrogram of the resulting cluster hierarchy to choose the m...
Richard Butterworth, Gregory Piatetsky-Shapiro, Da...