Sciweavers

1166 search results - page 120 / 234
» Crash Management for Distributed Parallel Systems
Sort
View
ICPP
2009
IEEE
15 years 8 months ago
Broker Selection Strategies in Interoperable Grid Systems
—The increasing demand for resources of the high performance computing systems has led to new forms of collaboration of distributed systems such as interoperable grid systems tha...
Ivan Rodero, Francesc Guim, Julita Corbalán...
HPDC
1998
IEEE
15 years 5 months ago
WebOS: Operating System Services for Wide Area Applications
In this paper, we argue for the power of providing a common set of OS services to wide area applications, including mechanisms for resource discovery, a global namespace, remote p...
Amin Vahdat, Thomas E. Anderson, Michael Dahlin, E...
CCGRID
2008
IEEE
15 years 3 months ago
Using Probabilistic Characterization to Reduce Runtime Faults in HPC Systems
Abstract--The current trend in high performance computing is to aggregate ever larger numbers of processing and interconnection elements in order to achieve desired levels of compu...
Jim M. Brandt, Bert J. Debusschere, Ann C. Gentile...
EUROSYS
2008
ACM
15 years 10 months ago
Samurai: protecting critical data in unsafe languages
Programs written in type-unsafe languages such as C and C++ incur costly memory errors that result in corrupted data structures, program crashes, and incorrect results. We present...
Karthik Pattabiraman, Vinod Grover, Benjamin G. Zo...
ICPADS
2006
IEEE
15 years 7 months ago
Flexible, Low-overhead Event Logging to Support Resource Scheduling
Flexible resource management and scheduling policies require detailed system-state information. Traditional, monolithic operating systems with a centralized kernel derive the requ...
Jan Stoess, Volkmar Uhlig