In LAPACK many matrix operations are cast as block algorithms which iteratively process a panel using an unblocked algorithm and then update a remainder matrix using the high perf...
:. In designing application-specific bit-level architectures and in programming existing bit-level processor arrays, it is necessary to expand a word-level algorithm into its bit-...
This paper describes the implementation of a runtime library for asynchronous communication in the Cell BE processor. The runtime library implementation provides with several servi...
Abstract Parallel asynchronous iterative algorithms relax synchronization and communication requirements, and can potentially extend Desktop Grids beyond embarrassingly parallel ap...
Abstract. This paper presents a scalable method for parallel symbolic on-the-fly model checking in a distributed memory environment. Our method combines a scheme for on-the-fly mod...