Sciweavers

SIGIR
2008
ACM

Topic-bridged PLSA for cross-domain text classification

13 years 4 months ago
Topic-bridged PLSA for cross-domain text classification
In many Web applications, such as blog classification and newsgroup classification, labeled data are in short supply. It often happens that obtaining labeled data in a new domain is expensive and time consuming, while there may be plenty of labeled data in a related but different domain. Traditional text classification approaches are not able to cope well with learning across different domains. In this paper, we propose a novel cross-domain text classification algorithm which extends the traditional probabilistic latent semantic analysis (PLSA) algorithm to integrate labeled and unlabeled data, which come from different but related domains, into a unified probabilistic model. We call this new model Topic-bridged PLSA, or TPLSA. By exploiting the common topics between two domains, we transfer knowledge across different domains through a topic-bridge to help the text classification in the target domain. A unique advantage of our method is its ability to maximally mine knowledge that can...
Gui-Rong Xue, Wenyuan Dai, Qiang Yang, Yong Yu
Added 15 Dec 2010
Updated 15 Dec 2010
Type Journal
Year 2008
Where SIGIR
Authors Gui-Rong Xue, Wenyuan Dai, Qiang Yang, Yong Yu
Comments (0)