Abstract. Data-parallel languages like OpenCL and CUDA are an important means to exploit the computational power of today’s computing devices. In this paper, we deal with two asp...
The UTS benchmark is used to evaluate task parallelism in OpenMP 3.0 as implemented in a number of recently released compilers and run-time systems. UTS performs parallel search of...
This paper evaluates remote memory access (RMA) communication capabilities and performance on the Cray XT3. We discuss properties of the network hardware and Portals networking so...
Abstract. Programmability is an important capability that provides flexible computing devices, but it incurs significant performance and power penalties. We have proposed an archit...
Arthur Abnous, Katsunori Seno, Yuji Ichikawa, Marl...
This paper describes a new FPGA implementation of a system for evolutionary image filter design. Three parallel search algorithms are compared. An optimal mutation rate and the q...