Strong Feature Sets from Small Samples

11 years 7 months ago
Strong Feature Sets from Small Samples
For small samples, classi er design algorithms typically suffer from over tting. Given a set of features, a classi er must be designed and its error estimated. For small samples, an error estimator may be unbiased but, owing to a large variance, often give very optimistic estimates. This paper proposes mitigating the small-sample problem by designing classi ers from a probability distribution resulting from spreading the mass of the sample points to make classi cation more dif cult, while maintaining sample geometry. The algorithm is parameterized by the variance of the spreading distribution. By increasing the spread, the algorithm nds gene sets whose classi cation accuracy remains strong relative to greater spreading of the sample. The error gives a measure of the strength of the feature set as a function of the spread. The algorithm yields feature sets that can distinguish the two classes, not only for the sample data, but for distributions spread beyond the sample data. For linear...
Seungchan Kim, Edward R. Dougherty, Junior Barrera
Added 22 Dec 2010
Updated 22 Dec 2010
Type Journal
Year 2002
Where JCB
Authors Seungchan Kim, Edward R. Dougherty, Junior Barrera, Yidong Chen, Michael L. Bittner, Jeffrey M. Trent
Comments (0)