Published data is prone to privacy attacks. Sanitization methods aim to prevent these attacks while maintaining usefulness of the data for legitimate users. Quantifying the trade-...
In this paper, we define and study a novel text mining problem, which we refer to as Comparative Text Mining (CTM). Given a set of comparable text collections, the task of compara...
Most time series data mining algorithms use similarity search as a core subroutine, and thus the time taken for similarity search is the bottleneck for virtually all time series d...
Thanawin Rakthanmanon, Bilson J. L. Campana, Abdul...
Many data mining tasks (e.g., Association Rules, Sequential Patterns) use complex pointer-based data structures (e.g., hash trees) that typically suffer from sub-optimal data loca...
Srinivasan Parthasarathy, Mohammed Javeed Zaki, We...
The input to an algorithm that learns a binary classifier normally consists of two sets of examples, where one set consists of positive examples of the concept to be learned, and ...