Free Online Productivity Tools
i2Speak
i2Symbol
i2OCR
iTex2Img
iWeb2Print
iWeb2Shot
i2Type
iPdf2Split
iPdf2Merge
i2Bopomofo
i2Arabic
i2Style
i2Image
i2PDF
iLatex2Rtf
Sci2ools

KDD

2004

ACM

2004

ACM

In this paper, we define and study a novel text mining problem, which we refer to as Comparative Text Mining (CTM). Given a set of comparable text collections, the task of comparative text mining is to discover any latent common themes across all collections as well as summarize the similarity and differences of these collections along each common theme. This general problem subsumes many interesting applications, including business intelligence and opinion summarization. We propose a generative probabilistic mixture model for comparative text mining. The model simultaneously performs cross-collection clustering and withincollection clustering, and can be applied to an arbitrary set of comparable text collections. The model can be estimated efficiently using the Expectation-Maximization (EM) algorithm. We evaluate the model on two different text data sets (i.e., a news article data set and a laptop review data set), and compare it with a baseline clustering method also based on a mixt...

Related Content

Added |
30 Nov 2009 |

Updated |
30 Nov 2009 |

Type |
Conference |

Year |
2004 |

Where |
KDD |

Authors |
ChengXiang Zhai, Atulya Velivelli, Bei Yu |

Comments (0)