Sciweavers

205 search results - page 7 / 41
» On the Reliability of Large-Scale Distributed Systems A Topo...
Sort
View
88
Voted
HOTI
2005
IEEE
15 years 5 months ago
Challenges in Building a Flat-Bandwidth Memory Hierarchy for a Large-Scale Computer with Proximity Communication
Memory systems for conventional large-scale computers provide only limited bytes/s of data bandwidth when compared to their flop/s of instruction execution rate. The resulting bo...
Robert J. Drost, Craig Forrest, Bruce Guenin, Ron ...
CORR
2008
Springer
185views Education» more  CORR 2008»
14 years 11 months ago
Realizing Fast, Scalable and Reliable Scientific Computations in Grid Environments
The practical realization of managing and executing large scale scientific computations efficiently and reliably is quite challenging. Scientific computations often invo...
Yong Zhao, Ioan Raicu, Ian T. Foster, Mihael Hateg...
3PGCIC
2010
14 years 9 months ago
Using a Failure History Service for Reliable Grid Node Information
The need for reliability in Grid Systems is a difficult challenge which is very important in the context of highly dynamic systems composed of thousands of nodes. Failure manageme...
Catalin Leordeanu, Valentin Cristea, Thomas Ropars...
ICDCS
2010
IEEE
15 years 3 months ago
Visual, Log-Based Causal Tracing for Performance Debugging of MapReduce Systems
Abstract—The distributed nature and large scale of MapReduce programs and systems poses two challenges in using existing profiling and debugging tools to understand MapReduce pr...
Jiaqi Tan, Soila Kavulya, Rajeev Gandhi, Priya Nar...
DEBS
2007
ACM
15 years 3 months ago
Adapting publish-subscribe routing to traffic demands
Most of currently available content-based publish-subscribe systems that were designed to operate in large scale, wired scenarios, build their routing infrastructure as a set of b...
Matteo Migliavacca, Gianpaolo Cugola