Addressed in this paper is the issue of `email data cleaning' for text mining. Many text mining applications need take emails as input. Email data is usually noisy and thus i...
We present iTag, a personalized tag recommendation system for blogs. iTag improves on the state-of-the-art in tag recommendation systems in two ways. First, iTag has much higher p...
In many Web applications, such as blog classification and newsgroup classification, labeled data are in short supply. It often happens that obtaining labeled data in a new domain ...
In most IR clustering problems, we directly cluster the documents, working in the document space, using cosine similarity between documents as the similarity measure. In many real...
Multi-document summarization aims to create a compressed summary while retaining the main characteristics of the original set of documents. Many approaches use statistics and mach...
Dingding Wang, Tao Li, Shenghuo Zhu, Chris H. Q. D...