Addressed in this paper is the issue of `email data cleaning' for text mining. Many text mining applications need take emails as input. Email data is usually noisy and thus i...
Order-preserving submatrixes (OPSMs) have been accepted as a biologically meaningful subspace cluster model, capturing the general tendency of gene expressions across a subset of ...
Byron J. Gao, Obi L. Griffith, Martin Ester, Steve...
It is often expensive to acquire data in real-world data mining applications. Most previous data mining and machine learning research, however, assumes that a fixed set of trainin...
In this paper we introduce a modular, highly flexible, opensource environment for data generation. Using an existing graphical data flow tool, the user can combine various types...
Abstract—Data sanitization has been used to restrict reidentification of individuals and disclosure of sensitive information from published data. We propose an attack on the pri...