Very large scale ReliefF for genome-wide association analysis

14 years 2 months ago
Very large scale ReliefF for genome-wide association analysis
— The genetic causes of many monogenic diseases have already been discovered. However, most common diseases are actually the result of complex nonlinear interactions between multiple genetic and environmental components. There is thus a pressing need for new computational methods capable of detecting nonlinearly interacting single nucleotide polymorphism (SNPs) that are associated with disease, from amidst up to hundreds of thousands of candidate SNPs. Recently, some progress has been made using feature selection algorithms based on weights from the ReliefF data mining algorithm on sets of up to 1500 SNPs. However, the accuracy of ReliefF does not scale up to the sizes needed for truly large genome-scale SNP association studies. We propose a population-based variant dubbed VLSReliefF, which mitigates this performance drop by stochastically applying ReliefF to SNP subsets, and then assigning each SNP the maximum ReliefF weight it achieved in any subset. A heuristic method is proposed ...
Margaret J. Eppstein, Paul Haake
Added 29 May 2010
Updated 29 May 2010
Type Conference
Year 2008
Authors Margaret J. Eppstein, Paul Haake
Comments (0)