Sciweavers

9 search results - page 2 / 2
» Performance Implications of Periodic Checkpointing on Large-...
Sort
View
ICDCS
2012
IEEE
11 years 7 months ago
Combining Partial Redundancy and Checkpointing for HPC
Today’s largest High Performance Computing (HPC) systems exceed one Petaflops (1015 floating point operations per second) and exascale systems are projected within seven years...
James Elliott, Kishor Kharbas, David Fiala, Frank ...
PVLDB
2008
110views more  PVLDB 2008»
13 years 4 months ago
Fault-tolerant stream processing using a distributed, replicated file system
We present SGuard, a new fault-tolerance technique for distributed stream processing engines (SPEs) running in clusters of commodity servers. SGuard is less disruptive to normal s...
YongChul Kwon, Magdalena Balazinska, Albert G. Gre...
ICDCS
2011
IEEE
12 years 4 months ago
Provisioning a Multi-tiered Data Staging Area for Extreme-Scale Machines
—Massively parallel scientific applications, running on extreme-scale supercomputers, produce hundreds of terabytes of data per run, driving the need for storage solutions to im...
Ramya Prabhakar, Sudharshan S. Vazhkudai, Youngjae...
INFOCOM
2006
IEEE
13 years 11 months ago
A Locating-First Approach for Scalable Overlay Multicast
— Recent proposals in multicast overlay construction have demonstrated the importance of exploiting underlying network topology. However, these topology-aware proposals often rel...
Mohamed Ali Kâafar, Thierry Turletti, Walid ...