Sampling is a widely used technique to increase efficiency in database and data mining applications operating on large dataset. In this paper we present a scalable sampling imple...
Correlated motif mining (CMM) is the problem to find overrepresented pairs of patterns, called motif pairs, in interacting protein sequences. Algorithmic solutions for CMM thereb...
Peter Boyen, Frank Neven, Dries Van Dyck, Aalt-Jan...
Inspired by emerging multi-core computer architectures, in this paper we present MT CLOSED, a multi-threaded algorithm for frequent closed itemset mining (FCIM). To the best of ou...
We present a strategy for answering fact-based natural language questions that is guided by a characterization of realworld user queries. Our approach, implemented in a system cal...
Traditional association mining algorithms use a strict definition of support that requires every item in a frequent itemset to occur in each supporting transaction. In real-life d...
Rohit Gupta, Gang Fang, Blayne Field, Michael Stei...