Multi-view Regression Via Canonical Correlation Analysis

15 years 5 months ago

Download ttic.uchicago.edu

In the multi-view regression problem, we have a regression problem where the input variable (which is a real vector) can be partitioned into two diﬀerent views, where it is assumed that either view of the input is suﬃcient to make accurate predictions — this is essentially (a signiﬁcantly weaker version of) the co-training assumption for the regression problem. We provide a semi-supervised algorithm which ﬁrst uses unlabeled data to learn a norm (or, equivalently, a kernel) and then uses labeled data in a ridge regression algorithm (with this induced norm) to provide the predictor. The unlabeled data is used via canonical correlation analysis (CCA, which is a closely related to PCA for two random variables) to derive an appropriate norm over functions. We are able to characterize the intrinsic dimensionality of the subsequent ridge regression problem (which uses this norm) by the correlation coeﬃcients provided by CCA in a rather simple expression. Interestingly, the norm u...

Sham M. Kakade, Dean P. Foster

Real-time Traffic