Combining coregularization and consensus-based self-training for multilingual text categorization

13 years 8 months ago

Download webia.lip6.fr

We investigate the problem of learning document classiﬁers in a multilingual setting, from collections where labels are only partially available. We address this problem in the framework of multiview learning, where diﬀerent languages correspond to diﬀerent views of the same document, combined with semi-supervised learning in order to beneﬁt from unlabeled documents. We rely on two techniques, coregularization and consensus-based self-training, that combine multiview and semi-supervised learning in diﬀerent ways. Our approach trains diﬀerent monolingual classiﬁers on each of the views, such that the classiﬁers’ decisions over a set of unlabeled examples are in agreement as much as possible, and iteratively labels new examples from another unlabeled training set based on a consensus across language-speciﬁc classiﬁers. We derive a boosting-based training algorithm for this task, and analyze the impact of the number of views on the semi-supervised learning results o...

Massih-Reza Amini, Cyril Goutte, Nicolas Usunier

Real-time Traffic

Consensus-based Self-training | Information Management | Semi-supervised Learning | Semi-supervised Learning Copyright | SIGIR 2010 |

claim paper

Post Info
More Details (n/a)

Added	24 Aug 2010
Updated	24 Aug 2010
Type	Conference
Year	2010
Where	SIGIR
Authors	Massih-Reza Amini, Cyril Goutte, Nicolas Usunier

Comments (0)

Sciweavers

Combining coregularization and consensus-based self-training for multilingual text categorization

Consensus-based Self-training | Information Management | Semi-supervised Learning | Semi-supervised Learning Copyright | SIGIR 2010 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers