Sciweavers

HPDC
2008
IEEE

DataLab: transactional data-parallel computing on an active storage cloud

13 years 10 months ago
DataLab: transactional data-parallel computing on an active storage cloud
Active storage clouds are an attractive platform for executing large data intensive workloads found in many fields of science. However, active storage presents new system management challenges. A large system of fault-prone machines with local persistent state can easily degenerate into a mess of unreferenced data and runaway computations. Our solution to this problem is DataLab, a software framework for running data parallel workloads on active storage clusters. DataLab provides a simple language for expressing workloads, works with legacy application codes, and achieves robustness through the use of distributed transactions. Our prototype implementation scales to 250 nodes on a large biometric image processing workload. Categories and Subject Descriptors C.4 [Performance]: Fault Tolerance; H.2.4 [Systems]: Parallel Databases General Terms Reliability, Performance Keywords Active Storage, Transactions, Cloud Computing
Brandon Rich, Douglas Thain
Added 29 May 2010
Updated 29 May 2010
Type Conference
Year 2008
Where HPDC
Authors Brandon Rich, Douglas Thain
Comments (0)