Super-sparse principal component analyses for high-throughput genomic data

15 years 5 months ago

Download www.biomedcentral.com

Background: Principal component analysis (PCA) has gained popularity as a method for the analysis of highdimensional genomic data. However, it is often difficult to interpret the results because the principal components are linear combinations of all variables, and the coefficients (loadings) are typically nonzero. These nonzero values also reflect poor estimation of the true vector loadings; for example, for gene expression data, biologically we expect only a portion of the genes to be expressed in any tissue, and an even smaller fraction to be involved in a particular process. Sparse PCA methods have recently been introduced for reducing the number of nonzero coefficients, but these existing methods are not satisfactory for high-dimensional data applications because they still give too many nonzero coefficients. Results: Here we propose a new PCA method that uses two innovations to produce an extremely sparse loading vector: (i) a random-effect model on the loadings that leads to an...

Donghwan Lee, Woojoo Lee, Youngjo Lee, Yudi Pawita

Real-time Traffic

BMCBI 2010 | Gene Expression Data | Principal Component | Principal Component Analysis |

claim paper

Added	08 Dec 2010
Updated	08 Dec 2010
Type	Journal
Year	2010
Where	BMCBI
Authors	Donghwan Lee, Woojoo Lee, Youngjo Lee, Yudi Pawitan

Sciweavers

Super-sparse principal component analyses for high-throughput genomic data

BMCBI 2010 | Gene Expression Data | Principal Component | Principal Component Analysis |

Explore & Download

Productivity Tools

Sciweavers