Sciweavers

DKE
2008

Extracting k most important groups from data efficiently

13 years 4 months ago
Extracting k most important groups from data efficiently
We study an important data analysis operator, which extracts the k most important groups from data (i.e., the k groups with the highest aggregate values). In a data warehousing context, an example of the above query is "find the 10 combinations of product-type and month with the largest sum of sales". The problem is challenging as the potential number of groups can be much larger than the memory capacity. We propose on-demand methods for efficient top-k groups processing, under limited memory size. In particular, we design top-k groups retrieval techniques for three representative scenarios as follows. For the scenario with data physically ordered by measure, we propose the write-optimized multi-pass sorted access algorithm (WMSA), that exploits available memory for efficient top-k groups computation. Regarding the scenario with unordered data, we develop the recursive hash algorithm (RHA), which applies hashing with early aggregation, coupled with branch-and-bound technique...
Man Lung Yiu, Nikos Mamoulis, Vagelis Hristidis
Added 10 Dec 2010
Updated 10 Dec 2010
Type Journal
Year 2008
Where DKE
Authors Man Lung Yiu, Nikos Mamoulis, Vagelis Hristidis
Comments (0)