Due to resource and power constraints, embedded processors often cannot afford dedicated floating-point units. For instance, the IBM PowerPC processor embedded in Xilinx Virtex-...
Ray C. C. Cheung, Dong-U Lee, Oskar Mencer, Wayne ...
Memory bandwidth limits the performance of important kernels in many scientific applications. Such applications often use sequences of Basic Linear Algebra Subprograms (BLAS), an...
Geoffrey Belter, Elizabeth R. Jessup, Ian Karlin, ...
New portable consumer embedded devices must execute multimedia applications (e.g., 3D games, video players and signal processing software, etc.) that demand extensive memory acces...
Engineering systems are becoming increasingly complex as state of the art technologies are incorporated into designs. Surety modeling and analysis is an emerging science which per...
James Davis, Jason Scott, Janos Sztipanovits, Marc...
Collective operations and non-blocking point-to-point operations are two important parts of MPI that each provide important performance and programmability benefits. Although non...