We present a semantic caching approach for optimizing the performance of information mediators. A critical problem with information mediators, particularly those gathering and int...
Traditional co-clustering methods identify block structures from static data matrices. However, the data matrices in many applications are dynamic; that is, they evolve smoothly o...
This paper presents Clusterfile, a parallel file system that provides parallel file access on a cluster of computers. Existing parallel file systems offer little control over matc...
Discovering rare categories and classifying new instances of them is
an important data mining issue in many fields, but fully supervised
learning of a rare class classifier is pr...
We propose a new unsupervised learning technique for extracting information from large text collections. We model documents as if they were generated by a two-stage stochastic pro...
Mark Steyvers, Padhraic Smyth, Michal Rosen-Zvi, T...