Interesting-Phrase Mining for Ad-Hoc Text Analytics

13 years 2 months ago

Download www.comp.nus.edu.sg

Large text corpora with news, customer mail and reports, or Web 2.0 contributions offer a great potential for enhancing business-intelligence applications. We propose a framework for performing text analytics on such data in a versatile, efﬁcient, and scalable manner. While much of the prior literature has emphasized mining keywords or tags in blogs or social-tagging communities, we emphasize the analysis of interesting phrases. These include named entities, important quotations, market slogans, and other multi-word phrases that are prominent in a dynamically derived ad-hoc subset of the corpus, e.g., being frequent in the subset but relatively infrequent in the overall corpus. We develop preprocessing and indexing methods for phrases, paired with new search techniques for the top-k most interesting phrases in ad-hoc subsets of the corpus. Our framework is evaluated using a large-scale real-world corpus of New York Times news articles.

Srikanta J. Bedathur, Klaus Berberich, Jens Dittri

Real-time Traffic

Ad-hoc Subsets | Interesting Phrases | Large-scale Real-world Corpus | PVLDB 2010 |

claim paper

Post Info
More Details (n/a)

Added	30 Jan 2011
Updated	30 Jan 2011
Type	Journal
Year	2010
Where	PVLDB
Authors	Srikanta J. Bedathur, Klaus Berberich, Jens Dittrich, Nikos Mamoulis, Gerhard Weikum

Comments (0)

Sciweavers

Interesting-Phrase Mining for Ad-Hoc Text Analytics

Ad-hoc Subsets | Interesting Phrases | Large-scale Real-world Corpus | PVLDB 2010 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers