Generalization Error Bounds Using Unlabeled Data

13 years 10 months ago

Download www.cs.helsinki.fi

We present two new methods for obtaining generalization error bounds in a semi-supervised setting. Both methods are based on approximating the disagreement probability of pairs of classiﬁers using unlabeled data. The ﬁrst method works in the realizable case. It suggests how the ERM principle can be reﬁned using unlabeled data and has provable optimality guarantees when the number of unlabeled examples is large. Furthermore, the technique extends easily to cover active learning. A downside is that the method is of little use in practice due to its limitation to the realizable case. The idea in our second method is to use unlabeled data to transform bounds for randomized classiﬁers into bounds for simpler deterministic classiﬁers. As a concrete example of how the general method works in practice, we apply it to a bound based on cross-validation. The result is a semi-supervised bound for classiﬁers learned based on all the labeled data. The bound is easy to implement and apply...

Matti Kääriäinen

Real-time Traffic