Sciweavers

PVLDB
2010

Interesting-Phrase Mining for Ad-Hoc Text Analytics

13 years 2 months ago
Interesting-Phrase Mining for Ad-Hoc Text Analytics
Large text corpora with news, customer mail and reports, or Web 2.0 contributions offer a great potential for enhancing business-intelligence applications. We propose a framework for performing text analytics on such data in a versatile, efficient, and scalable manner. While much of the prior literature has emphasized mining keywords or tags in blogs or social-tagging communities, we emphasize the analysis of interesting phrases. These include named entities, important quotations, market slogans, and other multi-word phrases that are prominent in a dynamically derived ad-hoc subset of the corpus, e.g., being frequent in the subset but relatively infrequent in the overall corpus. We develop preprocessing and indexing methods for phrases, paired with new search techniques for the top-k most interesting phrases in ad-hoc subsets of the corpus. Our framework is evaluated using a large-scale real-world corpus of New York Times news articles.
Srikanta J. Bedathur, Klaus Berberich, Jens Dittri
Added 30 Jan 2011
Updated 30 Jan 2011
Type Journal
Year 2010
Where PVLDB
Authors Srikanta J. Bedathur, Klaus Berberich, Jens Dittrich, Nikos Mamoulis, Gerhard Weikum
Comments (0)