Sciweavers

KDD
2005
ACM

Email data cleaning

14 years 5 months ago
Email data cleaning
Addressed in this paper is the issue of `email data cleaning' for text mining. Many text mining applications need take emails as input. Email data is usually noisy and thus it is necessary to clean it before mining. Several products offer email cleaning features, however, the types of noises that can be eliminated are restricted. Despite the importance of the problem, email cleaning has received little attention in the research community. A thorough and systematic investigation on the issue is thus needed. In this paper, email cleaning is formalized as a problem of non-text filtering and text normalization. In this way, email cleaning becomes independent from any specific text mining processing. A cascaded approach is proposed, which cleans up an email in four passes including non-text filtering, paragraph normalization, sentence normalization, and word normalization. As far as we know, non-text filtering and paragraph normalization have not been investigated previously. Methods ...
Jie Tang, Hang Li, Yunbo Cao, ZhaoHui Tang
Added 30 Nov 2009
Updated 30 Nov 2009
Type Conference
Year 2005
Where KDD
Authors Jie Tang, Hang Li, Yunbo Cao, ZhaoHui Tang
Comments (0)