Process technology has reduced in size such that it is possible to implement complete applicationspecific architectures as Systems-on-Chip (SoCs) using both Application-Specific I...
As semiconductor processing technology continues to scale down, managing reliability becomes an increasingly difficult challenge in high-performance microprocessor design. Transie...
Abstract—While measures such as raw compute performance and system capacity continue to be important factors for evaluating cluster performance, such issues as system reliability...
William M. Jones, John T. Daly, Nathan DeBardelebe...
—Accurate fault detection is a key element of resilient computing. Syslogs provide key information regarding faults, and are found on nearly all computing systems. Discovering ne...
Hidden information is a critical issue for the successful delivery of SLAs in grid systems. It arises when the agents (hardware and software resources) employed to serve a task be...