Sciweavers

DATAMINE
2006

Computing LTS Regression for Large Data Sets

13 years 4 months ago
Computing LTS Regression for Large Data Sets
Least trimmed squares (LTS) regression is based on the subset of h cases (out of n) whose least squares t possesses the smallest sum of squared residuals. The coverage h may be set between n=2 and n. The LTS method was proposed by Rousseeuw (1984, p. 876) as a highly robust regression estimator, with breakdown value (n ; h)=n. It turned out that the computation time of existing LTS algorithms grew too fast with the size of the data set, precluding their use for data mining. Therefore we develop a new algorithm called FAST-LTS. The basic ideas are an inequality involving order statistics and sums of squared residuals, and techniques which we call `selective iteration' and `nested extensions'. We also use an intercept adjustment technique to improve the precision. For small data sets FAST-LTS typically nds the exact LTS, whereas for larger data sets it gives more accurate results than existing algorithms for LTS and is faster by orders of magnitude. Moreover, FAST-LTS runs fas...
Peter Rousseeuw, Katrien van Driessen
Added 11 Dec 2010
Updated 11 Dec 2010
Type Journal
Year 2006
Where DATAMINE
Authors Peter Rousseeuw, Katrien van Driessen
Comments (0)