Addressed in this paper is the issue of `email data cleaning' for text mining. Many text mining applications need take emails as input. Email data is usually noisy and thus i...
Most information extraction (IE) approaches have considered only static text corpora, over which we apply IE only once. Many real-world text corpora however are dynamic. They evol...
Fei Chen 0002, Byron J. Gao, AnHai Doan, Jun Yang ...
Declarative data quality has been an active research topic. The fundamental principle behind a declarative approach to data quality is the use of declarative statements to realize...
Amit Chandel, Oktie Hassanzadeh, Nick Koudas, Moha...
We propose a General Markov Framework for computing page importance. Under the framework, a Markov Skeleton Process is used to model the random walk conducted by the web surfer on...
Bin Gao, Tie-Yan Liu, Zhiming Ma, Taifeng Wang, Ha...
Keyphrases are short phrases that reflect the main topic of a document. Because manually annotating documents with keyphrases is a time-consuming process, several automatic appro...
Katja Hofmann, Manos Tsagkias, Edgar Meij, Maarten...