Heterogeneous Data Integration with the Consensus Clustering Formalism

13 years 11 months ago
Heterogeneous Data Integration with the Consensus Clustering Formalism
Meaningfully integrating massive multi-experimental genomic data sets is becoming critical for the understanding of gene function. We have recently proposed methodologies for integrating large numbers of microarray data sets based on consensus clustering. Our methods combine gene clusters into a unified representation, or a consensus, that is insensitive to mis-classifications in the individual experiments. Here we extend their utility to heterogeneous data sets and focus on their refinement and improvement. First of all we compare our best heuristic to the popular majority rule consensus clustering heuristic, and show that the former yields tighter consensuses. We propose a refinement to our consensus algorithm by clustering of the source-specific clusterings as a step before finding the consensus between them, thereby improving our original results and increasing their biological relevance. We demonstrate our methodology on three data sets of yeast with biologically interesting...
Vladimir Filkov, Steven Skiena
Added 01 Jul 2010
Updated 01 Jul 2010
Type Conference
Year 2004
Where DILS
Authors Vladimir Filkov, Steven Skiena
Comments (0)