Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

122

AIRS
2006
Springer

favoriteEmaildiscussreport

96views Information Technology» more AIRS 2006»

Learning to Separate Text Content and Style for Classification

15 years 5 months ago

Learning to Separate Text Content and Style for Classification

Download www.comp.nus.edu.sg

Many text documents naturally have two kinds of labels. For example, we may label web pages from universities according to their categories, such as "student" or "faculty", or according the source universities, such as "Cornell" or "Texas". We call one kind of labels the content and the other kind the style. Given a set of documents, each with both content and style labels, we seek to effectively learn to classify a set of documents in a new style with no content labels into its content classes. Assuming that every document is generated using words drawn from a mixture of two multinomial component models, one content model and one style model, we propose a method named Cartesian EM that constructs content models and style models through Expectation Maximization and performs classification of the unknown content classes transductively. Our experiments on real-world datasets show the proposed method to be effective for style independent text conten...

Dell Zhang, Wee Sun Lee

Real-time Traffic

AIRS 2006 | Content Classes | Content Models | Information Management | Many Text Documents |

claim paper

Related Content

» A Feature Based Approach to Leveraging Context for Classifying Newsgroup Style Discussion ...

» Stylecontent separation by anisotropic part scales

» Boosting for Text Classification with Semantic Features

» Style mining of electronic messages for multiple authorship discrimination first results

» Treating metadata as annotations separating the content markup from the content

» Gait Style and Gait Content Bilinear Models for Gait Recognition Using Gait Resampling

» Semisupervised Collaborative Text Classification

» EyeTracking Users Behavior in Relation to Cognitive Style within an Elearning Environment

» Text Classification Methodologies Applied to MicroText in Military Chat

Post Info
More Details (n/a)

Added	20 Aug 2010
Updated	20 Aug 2010
Type	Conference
Year	2006
Where	AIRS
Authors	Dell Zhang, Wee Sun Lee

Comments (0)