Co-Training and Expansion: Towards Bridging Theory and Practice

13 years 5 months ago

Download www.cs.cmu.edu

Co-training is a method for combining labeled and unlabeled data when examples can be thought of as containing two distinct sets of features. It has had a number of practical successes, yet previous theoretical analyses have needed very strong assumptions on the data that are unlikely to be satisfied in practice. In this paper, we propose a much weaker "expansion" assumption on the underlying data distribution, that we prove is sufficient for iterative cotraining to succeed given appropriately strong PAC-learning algorithms on each feature set, and that to some extent is necessary as well. This expansion assumption in fact motivates the iterative nature of the original co-training algorithm, unlike stronger assumptions (such as independence given the label) that allow a simpler one-shot co-training to succeed. We also heuristically analyze the effect on performance of noise in the data. Predicted behavior is qualitatively matched in synthetic experiments on expander graphs.

Maria-Florina Balcan, Avrim Blum, Ke Yang

Real-time Traffic

Co-training | NIPS 2004 | NIPS 2007 | Original Co-training Algorithm | Simpler One-shot Co-training |

claim paper

Added	31 Oct 2010
Updated	31 Oct 2010
Type	Conference
Year	2004
Where	NIPS
Authors	Maria-Florina Balcan, Avrim Blum, Ke Yang

Sciweavers

Co-Training and Expansion: Towards Bridging Theory and Practice

Co-training | NIPS 2004 | NIPS 2007 | Original Co-training Algorithm | Simpler One-shot Co-training |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers