Sciweavers

NIPS
2004

Co-Training and Expansion: Towards Bridging Theory and Practice

13 years 5 months ago
Co-Training and Expansion: Towards Bridging Theory and Practice
Co-training is a method for combining labeled and unlabeled data when examples can be thought of as containing two distinct sets of features. It has had a number of practical successes, yet previous theoretical analyses have needed very strong assumptions on the data that are unlikely to be satisfied in practice. In this paper, we propose a much weaker "expansion" assumption on the underlying data distribution, that we prove is sufficient for iterative cotraining to succeed given appropriately strong PAC-learning algorithms on each feature set, and that to some extent is necessary as well. This expansion assumption in fact motivates the iterative nature of the original co-training algorithm, unlike stronger assumptions (such as independence given the label) that allow a simpler one-shot co-training to succeed. We also heuristically analyze the effect on performance of noise in the data. Predicted behavior is qualitatively matched in synthetic experiments on expander graphs.
Maria-Florina Balcan, Avrim Blum, Ke Yang
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2004
Where NIPS
Authors Maria-Florina Balcan, Avrim Blum, Ke Yang
Comments (0)