Random sampling is an appealing approach to build synopses of large data streams because random samples can be used for a broad spectrum of analytical tasks. Users are often inter...
Given a dataset P and a preference function f, a top-k query retrieves the k tuples in P with the highest scores according to f. Even though the problem is well-studied in convent...
The proliferation of knowledge-sharing communities and the advances in information extraction have enabled the construction of large knowledge bases using the RDF data model to re...
Nicoleta Preda, Gjergji Kasneci, Fabian M. Suchane...
Classification is an important problem in data mining. Given a database of records, each with a class label, a classifier generates a concise and meaningful description for each c...
Despite the flurry of anomaly-detection papers in recent years, effective ways to validate and compare proposed solutions have remained elusive. We argue that evaluating anomaly d...