Sciweavers

CORR
2008
Springer

Accelerating Large-scale Data Exploration through Data Diffusion

13 years 4 months ago
Accelerating Large-scale Data Exploration through Data Diffusion
Data-intensive applications often require exploratory analysis of large datasets. If analysis is performed on distributed resources, data locality can be crucial to high throughput and performance. We propose a "data diffusion" approach that acquires compute and storage resources dynamically, replicates data in response to demand, and schedules computations close to data. As demand increases, more resources are acquired, thus allowing faster response to subsequent requests that refer to the same data; when demand drops, resources are released. This approach can provide the benefits of dedicated hardware without the associated high costs, depending on workload and resource characteristics. The approach is reminiscent of cooperative caching, web-caching, and peer-to-peer storage systems, but addresses different application demands. Other data-aware scheduling approaches assume dedicated resources, which can be expensive and/or inefficient if load varies significantly. To explo...
Ioan Raicu, Yong Zhao, Ian T. Foster, Alexander S.
Added 09 Dec 2010
Updated 09 Dec 2010
Type Journal
Year 2008
Where CORR
Authors Ioan Raicu, Yong Zhao, Ian T. Foster, Alexander S. Szalay
Comments (0)