Clustering by pattern similarity in large data sets

16 years 20 days ago

Download www.cs.unc.edu

Clustering is the process of grouping a set of objects into classes of similar objects. Although definitions of similarity vary from one clustering model to another, in most of these models the concept of similarity is based on distances, e.g., Euclidean distance or cosine distance. In other words, similar objects are required to have close values on at least a set of dimensions. In this paper, we explore a more general type of similarity. Under the pCluster model we proposed, two objects are similar if they exhibit a coherent pattern on a subset of dimensions. For instance, in DNA microarray analysis, the expression levels of two genes may rise and fall synchronously in response to a set of environmental stimuli. Although the magnitude of their expression levels may not be close, the patterns they exhibit can be very much alike. Discovery of such clusters of genes is essential in revealing significant connections in gene regulatory networks. E-commerce applications, such as collabora...

Haixun Wang, Wei Wang 0010, Jiong Yang, Philip S.

Real-time Traffic

Certain Leading Indicators | Database | Expression Levels | SIGMOD 2002 | Similar Objects |

claim paper

» IMDC An ImageMapped Data Clustering Technique for Large Datasets

» WebSets extracting sets of entities from the web using unsupervised information extraction

» Subspace outlier mining in large multimedia databases

» A Unified View on Clustering Binary Data

» SyMP an efficient clustering approach to identify clusters of arbitrary shapes in large da...

» Learning to match and cluster large highdimensional data sets for data integration

» Mining for Putative Regulatory Elements in the Yeast Genome Using Gene Expression Data

» Discovering Representative Models in Large Time Series Databases

Post Info
More Details (n/a)

Added	08 Dec 2009
Updated	08 Dec 2009
Type	Conference
Year	2002
Where	SIGMOD
Authors	Haixun Wang, Wei Wang 0010, Jiong Yang, Philip S. Yu

Comments (0)

Sciweavers

Clustering by pattern similarity in large data sets

Certain Leading Indicators | Database | Expression Levels | SIGMOD 2002 | Similar Objects |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers