This paper addresses the repeated acquisition of labels for data items when the labeling is imperfect. We examine the improvement (or lack thereof) in data quality via repeated la...
Victor S. Sheng, Foster J. Provost, Panagiotis G. ...
Real-life date is often dirty and costs billions of pounds to businesses worldwide each year. This paper presents a promising approach to improving data quality. It effectively det...
Active inference seeks to maximize classification performance while minimizing the amount of data that must be labeled ex ante. This task is particularly relevant in the context o...
Matthew J. Rattigan, Marc Maier, David Jensen, Bin...
Researchers in the data mining area frequently have to spend significant portion of their time on preprocessing the data in order to apply their algorithms to real-world datasets...
Zhaoqi Chen, Dmitri V. Kalashnikov, Sharad Mehrotr...
In this paper we present UMiner, a new data mining system, which improves the quality of the data analysis results, handles uncertainty in the clustering & classification proce...
Christos Amanatidis, Maria Halkidi, Michalis Vazir...