Text categorization involves mapping of documents to a fixed set of labels. A similar but equally important problem is that of assigning labels to large corpora. With a deluge of ...
Multi-view algorithms, such as co-training and co-EM, utilize unlabeled data when the available attributes can be split into independent and compatible subsets. Co-EM outperforms ...
We investigate the problem of automatically labelling
faces of characters in TV or movie material with their
names, using only weak supervision from automaticallyaligned
subtitl...
The goal of semi-supervised learning (SSL) methods is to reduce the amount of labeled training data required by learning from both labeled and unlabeled instances. Macskassy and Pr...
Recently there has been significant interest in supervised learning algorithms that combine labeled and unlabeled data for text learning tasks. The co-training setting [1] applie...