Abstract— The generalised linear model (GLM) is the standard approach in classical statistics for regression tasks where it is appropriate to measure the data misfit using a lik...
Gavin C. Cawley, Gareth J. Janacek, Nicola L. C. T...
As the size and dimensionality of data sets increase, the task of feature selection has become increasingly important. In this paper we demonstrate how association rules can be us...
Today many applications routinely generate large quantities of data. The data often takes the form of (time) series, or more generally streams, i.e. an ordered sequence of records...
The problem of similarity search (query-by-content) has attracted much research interest. It is a difficult problem because of the inherently high dimensionality of the data. The ...
Rare category detection is an open challenge for active learning, especially in the de-novo case (no labeled examples), but of significant practical importance for data mining - ...