Finding Representative Set from Massive Data

13 years 10 months ago

Download www.cs.unc.edu

In the information age, data is pervasive. In some applications, data explosion is a signiﬁcant phenomenon. The massive data volume poses challenges to both human users and computers. In this project, we propose a new model for identifying representative set from a large database. A representative set is a special subset of the original dataset, which has three main characteristics: It is signiﬁcantly smaller in size compared to the original dataset. It captures the most information from the original dataset compared to other subsets of the same size. It has low redundancy among the representatives it contains. We use informationtheoretic measures such as mutual information and relative entropy to measure the representativeness of the representative set. We ﬁrst design a greedy algorithm and then present a heuristic algorithm that delivers much better performance. We run experiments on two real datasets and evaluate the effectiveness of our representative set in terms of coverag...

Feng Pan, Wei Wang 0010, Anthony K. H. Tung, Jiong

Real-time Traffic

Data Mining | ICDM 2005 | Original Dataset | Representative Set | Representative Set Attains |

claim paper

» Identifying Representative Trends in Massive Time Series Data Sets Using Sketches

» Mining compressed commodity workflows from massive RFID data sets

» Finding Representative Association Rules from Large Rule Collections

» The DGX distribution for mining massive skewed data

» RepFrag a graph based method for finding repeats and transposons from fragmented genomes

» Utilization of two sample ttest statistics from redundant probe sets to evaluate different...

» Finding Time Series Motifs in DiskResident Data

» See all by looking at a few Sparse modeling for finding representative objects

Post Info
More Details (n/a)

Added	24 Jun 2010
Updated	24 Jun 2010
Type	Conference
Year	2005
Where	ICDM
Authors	Feng Pan, Wei Wang 0010, Anthony K. H. Tung, Jiong Yang

Comments (0)

Sciweavers

Finding Representative Set from Massive Data

Data Mining | ICDM 2005 | Original Dataset | Representative Set | Representative Set Attains |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers