Sciweavers

KDD
2001
ACM

Hierarchical cluster analysis of SAGE data for cancer profiling

14 years 5 months ago
Hierarchical cluster analysis of SAGE data for cancer profiling
In this paper we present a method for clustering SAGE (Serial Analysis of Gene Expression) data to detect similarities and dissimilarities between different types of cancer on the subcellular level. The data, however, is extremely high dimensional, and due to the method of measurement, there are many errors as well as missing values in the data, challenging any clustering algorithm. Therefore, we introduce special pre-processing techniques to reduce these errors and to restore missing data. These techniques are tailored to the process that generates the data, making only very conservative changes. Furthermore, we present a new subspace selection technique to identify a relevant subset of attributes (genes) using the Wilcoxon test. This is a general technique that can be applied to select subspaces for the purpose of clustering whenever some high-level categories of interest are known for the data (such as cancerous and noncancerous). Finally, we discuss the results of the application ...
Jörg Sander, Monica C. Sleumer, Raymond T. Ng
Added 30 Nov 2009
Updated 30 Nov 2009
Type Conference
Year 2001
Where KDD
Authors Jörg Sander, Monica C. Sleumer, Raymond T. Ng
Comments (0)