Novel, high-throughput technologies are challenging the core of algorithmic methods available in Computer Science. Microarray technologies give Life Sciences researchers the oppor...
Many data mining applications have a large amount of data but labeling data is often difficult, expensive, or time consuming, as it requires human experts for annotation. Semi-supe...
In a data streaming setting, data points are observed one by one. The concepts to be learned from the data points may change infinitely often as the data is streaming. In this pap...
Unsupervised clustering can be significantly improved using supervision in the form of pairwise constraints, i.e., pairs of instances labeled as belonging to same or different clu...
Existing data cleaning methods work on the basis of computing the degree of similarity between nearby records in a sorted database. High recall is achieved by accepting records wi...