: Statistics that accurately describe the distribution of data values in the columns of relational tables are essential for effective query optimization in a database management sy...
Alexander Behm, Volker Markl, Peter J. Haas, Kesha...
A lift curve, with the true positive rate on the y-axis and the customer pull (or contact) rate on the x-axis, is often used to depict the model performance in many data mining ap...
There is growing interest in algorithms for processing and querying continuous data streams (i.e., data that is seen only once in a fixed order) with limited memory resources. In i...
Sumit Ganguly, Minos N. Garofalakis, Rajeev Rastog...
Essentially all data mining algorithms assume that the datagenerating process is independent of the data miner's activities. However, in many domains, including spam detectio...
Nilesh N. Dalvi, Pedro Domingos, Mausam, Sumit K. ...
To fulfill the requirement of fast interactive multidimensional data analysis, database systems precompute aggregate views on some subsets of dimensions and their corresponding hi...
Amit Shukla, Prasad Deshpande, Jeffrey F. Naughton