Sciweavers

36 search results - page 3 / 8
» Checkpointing and recovery in a transaction-based DSM operat...
Sort
View
DELOS
2007
13 years 7 months ago
Integration of Reliable Sensor Data Stream Management into Digital Libraries
Data Stream Management (DSM) addresses the continuous processing of sensor data. DSM requires the combination of stream operators, which may run on different distributed devices, ...
Gert Brettlecker, Heiko Schuldt, Peter M. Fischer,...
ICDCS
2012
IEEE
11 years 8 months ago
Combining Partial Redundancy and Checkpointing for HPC
Today’s largest High Performance Computing (HPC) systems exceed one Petaflops (1015 floating point operations per second) and exascale systems are projected within seven years...
James Elliott, Kishor Kharbas, David Fiala, Frank ...
ISPDC
2003
IEEE
13 years 11 months ago
Lightweight Logging and Recovery for Distributed Shared Memory over Virtual Interface Architecture
As software Distributed Shared Memory(DSM) systems become attractive on larger clusters, the focus of attention moves toward improving the reliability of systems. In this paper, w...
Soyeon Park, Youngjae Kim, Seung Ryoul Maeng
DFT
2003
IEEE
154views VLSI» more  DFT 2003»
13 years 11 months ago
Fault Recovery Based on Checkpointing for Hard Real-Time Embedded Systems
Safety-critical embedded systems often operate in harsh environmental conditions that necessitate fault-tolerant computing techniques. Many safety-critical systems also execute re...
Ying Zhang, Krishnendu Chakrabarty
CCGRID
2006
IEEE
13 years 11 months ago
Proposal of MPI Operation Level Checkpoint/Rollback and One Implementation
With the increasing number of processors in modern HPC(High Performance Computing) systems, there are two emergent problems to solve. One is scalability, the other is fault tolera...
Yuan Tang, Graham E. Fagg, Jack Dongarra