Sciweavers

AUSDM
2006
Springer

Data Mining Methodological Weaknesses and Suggested Fixes

13 years 8 months ago
Data Mining Methodological Weaknesses and Suggested Fixes
Predictive accuracy claims should give explicit descriptions of the steps followed, with access to the code used. This allows referees and readers to check for common traps, and to repeat the same steps on other data. Feature selection and/or model selection and/or tuning must be independent of the test data. For use of cross-validation, such steps must be repeated at each fold. Even then, such accuracy assessments have the limitation that the target population, to which results will be applied, is commonly different from the source population. Commonly, it is shifted forward in time, and it may differ in other respects also. A consequence of source/target differences is that highly sophisticated modeling may be pointless or even counter-productive. At best, model effects in the target population may be broadly similar. Investigation of the pattern of changes over time is required. Such studies are unusual in the data mining literature, in part because relevant data have not been avai...
John H. Maindonald
Added 20 Aug 2010
Updated 20 Aug 2010
Type Conference
Year 2006
Where AUSDM
Authors John H. Maindonald
Comments (0)