Generalization from Observed to Unobserved Features by Clustering

10 years 6 months ago
Generalization from Observed to Unobserved Features by Clustering
We argue that when objects are characterized by many attributes, clustering them on the basis of a random subset of these attributes can capture information on the unobserved attributes as well. Moreover, we show that under mild technical conditions, clustering the objects on the basis of such a random subset performs almost as well as clustering with the full attribute set. We prove finite sample generalization theorems for this novel learning scheme that extends analogous results from the supervised learning setting. We use our framework to analyze generalization to unobserved features of two well-known clustering algorithms: k-means and the maximum likelihood multinomial mixture model. The scheme is demonstrated for collaborative filtering of users with movie ratings as attributes and document clustering with words as attributes.
Eyal Krupka, Naftali Tishby
Added 13 Dec 2010
Updated 13 Dec 2010
Type Journal
Year 2008
Where JMLR
Authors Eyal Krupka, Naftali Tishby
Comments (0)