Sciweavers

AAAI
2004

Error Detection and Impact-Sensitive Instance Ranking in Noisy Datasets

13 years 6 months ago
Error Detection and Impact-Sensitive Instance Ranking in Noisy Datasets
Given a noisy dataset, how to locate erroneous instances and attributes and rank suspicious instances based on their impacts on the system performance is an interesting and important research issue. We provide in this paper an Error Detection and Impact-sensitive instance Ranking (EDIR) mechanism to address this problem. Given a noisy dataset D, we first train a benchmark classifier T from D. The instances, that cannot be effectively classified by T are treated as suspicious and forwarded to a subset S. For each attribute Ai, we switch Ai and the class label C to train a classifier APi for Ai. Given an instance Ik in S, we use APi and the benchmark classifier T to locate the erroneous value of each attribute Ai. To quantitatively rank instances in S, we define an impact measure based on the Information-gain Ratio (IR). We calculate IRi between attribute Ai and C, and use IRi as the impact-sensitive weight of Ai. The sum of impact-sensitive weights from all located erroneous attributes...
Xingquan Zhu, Xindong Wu, Ying Yang
Added 30 Oct 2010
Updated 30 Oct 2010
Type Conference
Year 2004
Where AAAI
Authors Xingquan Zhu, Xindong Wu, Ying Yang
Comments (0)