With over 800 million pages covering most areas of human endeavor, the World-wide Web is a fertile ground for data mining research to make a di erence to the e ectiveness of infor...
Spectral clustering is a widely used method for organizing data that only relies on pairwise similarity measurements. This makes its application to non-vectorial data straightforw...
Fabian L. Wauthier, Nebojsa Jojic, Michael I. Jord...
We present a novel approach to dealing with overfitting in black-box models. It is based on the leverages of the samples, i.e. on the influence that each observation has on the pa...
We study the problem of estimating selectivity of approximate substring queries. Its importance in databases is ever increasing as more and more data are input by users and are in...
Abstract. One of the most important data mining tasks is discovery of frequently occurring patterns in sequences of events. Many algorithms for finding various patterns in sequenti...