Loop distribution is one of the most useful techniques to reduce the execution time of parallel applications. Traditionally, loop scheduling algorithms are implemented based on pa...
Irregular and iterative I/O-intensive jobs need a different approach from parallel job schedulers. The focus in this case is not only the processing requirements anymore: memory, ...
— The efficient diagnosis of hardware and software faults in parallel and distributed systems remains a challenge in today’s most prolific decentralized environments. System-...
High-performance computing clusters running longlived tasks currently cannot have kernel software updates applied to them without causing system downtime. These clusters miss oppo...
: With the growing complexity of parallel architectures, the probability of system failures grows, too. One approach to cope with this problem is the self-healing, one of the organ...