Clustering is one of the most widely used statistical tools for data analysis. Among all existing clustering techniques, k-means is a very popular method because of its ease of pr...
The general approach for automatically driving data collection using information from previously acquired data is called active learning. Traditional active learning addresses the...
The conditional distribution of a discrete variable y, given another discrete variable x, is often specified by assigning one multinomial distribution to each state of x. The cost...
We present a directed Markov random field (MRF) model that combines n-gram models, probabilistic context free grammars (PCFGs) and probabilistic latent semantic analysis (PLSA) fo...
Shaojun Wang, Shaomin Wang, Russell Greiner, Dale ...
Despite of the large number of algorithms developed for clustering, the study on comparing clustering results is limited. In this paper, we propose a measure for comparing cluster...