This paper presents and validates performance models for a variety of high-performance collective communication algorithms for systems with Cell processors. The systems modeled in...
This paper presents a new algorithm called List-based Load Balancing (LLB) for compile-time task scheduling on distributed-memory machines. LLB is intended as a cluster-mapping an...
Andrei Radulescu, Arjan J. C. van Gemund, Hai-Xian...
WASMII, a virtual hardware system that executes data
ow algorithms, is based on an MPLD, an extended FPGA with multiple sets of conguration SRAM. Although we have developed an emu...
This paper presents a partitioning and allocation algorithm for an iterative stream compiler, targeting heterogeneous multiprocessors with constrained distributed memory and any c...
The storage requirements of the array-dominated and looporganized algorithmic specifications running on embedded systems can be significant. Employing a data memory space much l...