Sciweavers

73 search results - page 10 / 15
» Tiresias: Black-Box Failure Prediction in Distributed System...
Sort
View
93
Voted
ECRTS
1999
IEEE
15 years 2 months ago
Handling sporadic tasks in off-line scheduled distributed real-time systems
Many industrial applications mandate the use of a timetriggered paradigm and consequently the use of off-line scheduling for reasons such as predictability, certification, cost, o...
Damir Isovic, Gerhard Fohler
60
Voted
IPPS
2006
IEEE
15 years 3 months ago
Plan-based replication for fault-tolerant multi-agent systems
The growing importance of multi-agent applications and the need for a higher quality of service in these systems justify the increasing interest in fault-tolerant multi-agent syst...
Alessandro de Luna Almeida, Samir Aknine, Jean-Pie...
WOSS
2004
ACM
15 years 3 months ago
Combining statistical monitoring and predictable recovery for self-management
Complex distributed Internet services form the basis not only of e-commerce but increasingly of mission-critical networkbased applications. What is new is that the workload and in...
Armando Fox, Emre Kiciman, David A. Patterson
SC
2009
ACM
15 years 4 months ago
FALCON: a system for reliable checkpoint recovery in shared grid environments
In Fine-Grained Cycle Sharing (FGCS) systems, machine owners voluntarily share their unused CPU cycles with guest jobs, as long as the performance degradation is tolerable. For gu...
Tanzima Zerin Islam, Saurabh Bagchi, Rudolf Eigenm...
FAST
2011
14 years 1 months ago
Consistent and Durable Data Structures for Non-Volatile Byte-Addressable Memory
The predicted shift to non-volatile, byte-addressable memory (e.g., Phase Change Memory and Memristor), the growth of “big data”, and the subsequent emergence of frameworks su...
Shivaram Venkataraman, Niraj Tolia, Parthasarathy ...