The concepts of similarity and distance are crucial in data mining. We consider the problem of defining the distance between two data sets by comparing summary statistics compute...
Background: The choice of probe set algorithms for expression summary in a GeneChip study has a great impact on subsequent gene expression data analysis. Spiked-in cRNAs with know...
We compare different statistical characterizations of a set of strings, for three different histogram-based distances. Given a distance, a set of strings may be characterized by it...
The existence of large datasets requires methods for producing simpler or summarized views of data. Typical approaches to summarization based on statistics do not capture complete...
Background: Many statistical algorithms combine microarray expression data and genome sequence data to identify transcription factor binding motifs in the low eukaryotic genomes. ...