Sciweavers

ICML
2008
IEEE

Semi-supervised learning of compact document representations with deep networks

14 years 5 months ago
Semi-supervised learning of compact document representations with deep networks
Finding good representations of text documents is crucial in information retrieval and classification systems. Today the most popular document representation is based on a vector of word counts in the document. This representation neither captures dependencies between related words, nor handles synonyms or polysemous words. In this paper, we propose an algorithm to learn text document representations based on semi-supervised autoencoders that are stacked to form a deep network. The model can be trained efficiently on partially labeled corpora, producing very compact representations of documents, while retaining as much class information and joint word statistics as possible. We show that it is advantageous to exploit even a few labeled samples during training.
Marc'Aurelio Ranzato, Martin Szummer
Added 17 Nov 2009
Updated 17 Nov 2009
Type Conference
Year 2008
Where ICML
Authors Marc'Aurelio Ranzato, Martin Szummer
Comments (0)