One common predictive modeling challenge occurs in text mining problems is that the training data and the operational (testing) data are drawn from different underlying distributi...
Addressed in this paper is the issue of `email data cleaning' for text mining. Many text mining applications need take emails as input. Email data is usually noisy and thus i...
While a typical software component has a clearly specified (static) interface in terms of the methods and the input/output types they support, information about the correct sequen...
Current methods for selectivity estimation fall into two broad categories, synopsis-based and sampling-based. Synopsis-based methods, such as histograms, incur minimal overhead at ...
There are two broad approaches to query evaluation over probabilistic databases: (1) Intensional Methods proceed by manipulating expressions over symbolic events associated with u...