On distributing symmetric streaming computations

11 years 11 months ago
On distributing symmetric streaming computations
A common approach for dealing with large data sets is to stream over the input in one pass, and perform computations using sublinear resources. For truly massive data sets, however, even making a single pass over the data is prohibitive. Therefore, streaming computations must be distributed over many machines. In practice, obtaining significant speedups using distributed computation has numerous challenges including synchronization, load balancing, overcoming processor failures, and data distribution. Successful systems in practice such as Google's MapReduce and Apache's Hadoop address these problems by only allowing a certain class of highly distributable tasks defined by local computations that can be applied in any order to the input. The fundamental question that arises is: How does the class of computational tasks supported by these systems differ from the class for which streaming solutions exist? We introduce a simple algorithmic model for massive, unordered, distribu...
Jon Feldman, S. Muthukrishnan, Anastasios Sidiropo
Added 30 Oct 2010
Updated 30 Oct 2010
Type Conference
Year 2008
Where SODA
Authors Jon Feldman, S. Muthukrishnan, Anastasios Sidiropoulos, Clifford Stein, Zoya Svitkina
Comments (0)