Sciweavers

KDD
2002
ACM

Mining complex models from arbitrarily large databases in constant time

14 years 4 months ago
Mining complex models from arbitrarily large databases in constant time
In this paper we propose a scaling-up method that is applicable to essentially any induction algorithm based on discrete search. The result of applying the method to an algorithm is that its running time becomes independent of the size of the database, while the decisions made are essentially identical to those that would be made given infinite data. The method works within pre-specified memory limits and, as long as the data is iid, only requires accessing it sequentially. It gives anytime results, and can be used to produce batch, stream, time-changing and active-learning versions of an algorithm. We apply the method to learning Bayesian networks, developing an algorithm that is faster than previous ones by orders of magnitude, while achieving essentially the same predictive performance. We observe these gains on a series of large databases generated from benchmark networks, on the KDD Cup 2000 e-commerce data, and on a Web log containing 100 million requests. Categories and Subject...
Geoff Hulten, Pedro Domingos
Added 30 Nov 2009
Updated 30 Nov 2009
Type Conference
Year 2002
Where KDD
Authors Geoff Hulten, Pedro Domingos
Comments (0)