This paper takes a renewed look at the problem of managing intermediate data that is generated during dataflow computations (e.g., MapReduce, Pig, Dryad, etc.) within clouds. We d...
Steven Y. Ko, Imranul Hoque, Brian Cho, Indranil G...
Next-generation e-Science applications will require the ability to transfer information at high data rates between distributed computing centers and data repositories. To support ...
Parallelism can be used for major performance improvement in large Data warehouses (DW) with performance and scalability challenges. A simple low-cost shared-nothing architecture ...
Recent techniques for multicast or broadcast delivery of streaming media can provide immediate service to each client request yet achieve considerable client stream sharing (i.e.,...
Clustering is often formulated as a discrete optimization problem. The objective is to find, among all partitions of the data set, the best one according to some quality measure....