Sciweavers

66 search results - page 9 / 14
» The Checkpoint Problem
Sort
View
SRDS
1999
IEEE
15 years 4 months ago
Logging and Recovery in Adaptive Software Distributed Shared Memory Systems
Software distributed shared memory (DSM) improves the programmability of message-passing machines and workclusters by providing a shared memory abstract (i.e., a coherent global a...
Angkul Kongmunvattana, Nian-Feng Tzeng
97
Voted
DSN
2006
IEEE
15 years 5 months ago
BlueGene/L Failure Analysis and Prediction Models
The growing computational and storage needs of several scientific applications mandate the deployment of extreme-scale parallel machines, such as IBM’s BlueGene/L which can acc...
Yinglung Liang, Yanyong Zhang, Anand Sivasubramani...
83
Voted
ISCA
2006
IEEE
148views Hardware» more  ISCA 2006»
15 years 5 months ago
Tolerating Dependences Between Large Speculative Threads Via Sub-Threads
Thread-level speculation (TLS) has proven to be a promising method of extracting parallelism from both integer and scientific workloads, targeting speculative threads that range ...
Christopher B. Colohan, Anastassia Ailamaki, J. Gr...
108
Voted
LCPC
2009
Springer
15 years 4 months ago
A Communication Framework for Fault-Tolerant Parallel Execution
PC grids represent massive computation capacity at a low cost, but are challenging to employ for parallel computing because of variable and unpredictable performance and availabili...
Nagarajan Kanna, Jaspal Subhlok, Edgar Gabriel, Es...
85
Voted
IWSEC
2009
Springer
15 years 6 months ago
Tamper-Tolerant Software: Modeling and Implementation
Abstract. Common software-protection systems attempt to detect malicious observation and modification of protected applications. Upon tamper detection, anti-hacking code may produ...
Mariusz H. Jakubowski, Chit Wei Saw, Ramarathnam V...