Grids have the potential to revolutionize computing by providing ubiquitous, on demand access to computational services and resources. However, grid systems are extremely large, c...
Alexandre Duarte, Francisco Vilar Brasileiro, Walf...
Web servers on the Internet need to maintain high reliability, but the cause of intermittent failures of web transactions is non-obvious. We use approximate Bayesian inference to ...
The distributed nature of the Internet makes it difficult for a single service provider to troubleshoot the disruptions experienced by its customers. We propose NetDiagnoser, a tr...
Designing a distributed fault tolerance algorithm requires careful analysis of both fault models and diagnosis strategies. A system will fail if there are too many active faults, ...
The structural redundancy inherent to on-chip interconnection networks [networks on chip (NoC)] can be exploited by adaptive routing algorithms in order to provide connectivity eve...