Fault tolerance is a very important concern for critical high performance applications using the MPI library. Several protocols provide automatic and transparent fault detection a...
Pierre Lemarinier, Aurelien Bouteiller, Thomas H&e...
As high performance clusters continue to grow in size, the mean time between failure shrinks. Thus, the issues of fault tolerance and reliability are becoming one of the challengi...
Current ubiquitous computing environments provide many kinds of information. This information may be accessed by different users under varying conditions depending on various conte...
Despite using high-speed network interconnection systems like InfiniBand, the communication overhead for parallel applications is still high. In this paper we show, how such costs...
Robert Rex, Frank Mietke, Wolfgang Rehm, Christoph...
In previous work, a new approach called Open CSP (OCSP) was defined as a way of integrate information gathering and problem solving. Instead of collecting all variable values befor...