As text corpora become larger, tradeoffs between speed and accuracy become critical: slow but accurate methods may not complete in a practical amount of time. In order to make the...
Lawrence Shih, Jason D. Rennie, Yu-Han Chang, Davi...
Topic models provide a powerful tool for analyzing large text collections by representing high dimensional data in a low dimensional subspace. Fitting a topic model given a set of...
In this paper we address the problem of combining multiple clusterings without access to the underlying features of the data. This process is known in the literature as clustering...
Algorithms for tracking concept drift are important for many applications. We present a general method based on the Weighted Majority algorithm for using any online learner for co...
— One way to handle data mining problems where class prior probabilities and/or misclassification costs between classes are highly unequal is to resample the data until a new, d...