Abstract— In many image processing applications, fast convolution of an image with a large 2D filter is required. Field Programable Gate Arrays (FPGAs) are often used to achieve...
Christos-Savvas Bouganis, George A. Constantinides...
: Since the era of vector and pipelined computing, the computational speed is limited by the memory access time. Faster caches and more cache levels are used to bridge the growing ...
good data partitioning scheme is the need of the time. However it is very diflcult to arrive at a good solution as the number of possible dutupartitionsfor a given real lifeprogra...
Task graph scheduling has been found effective in performance prediction and optimization of parallel applications. A number of static scheduling algorithms have been proposed for...
The Cell BE is a multicore processor with eight vector accelerators (called SPEs) that implement explicit cache management through direct memory access engines. While the Cell has...
Srinivas Chellappa, Franz Franchetti, Markus P&uum...