Semi-supervised learning of compact document representations with deep networks

14 years 5 months ago

Download www.cs.nyu.edu

Finding good representations of text documents is crucial in information retrieval and classification systems. Today the most popular document representation is based on a vector of word counts in the document. This representation neither captures dependencies between related words, nor handles synonyms or polysemous words. In this paper, we propose an algorithm to learn text document representations based on semi-supervised autoencoders that are stacked to form a deep network. The model can be trained efficiently on partially labeled corpora, producing very compact representations of documents, while retaining as much class information and joint word statistics as possible. We show that it is advantageous to exploit even a few labeled samples during training.

Marc'Aurelio Ranzato, Martin Szummer

Real-time Traffic

Compact Representations | ICML 2008 | Machine Learning | Popular Document Representation | Text Document Representations |

claim paper

Added	17 Nov 2009
Updated	17 Nov 2009
Type	Conference
Year	2008
Where	ICML
Authors	Marc'Aurelio Ranzato, Martin Szummer

Sciweavers

Semi-supervised learning of compact document representations with deep networks

Compact Representations | ICML 2008 | Machine Learning | Popular Document Representation | Text Document Representations |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers