The UTS benchmark is used to evaluate task parallelism in OpenMP 3.0 as implemented in a number of recently released compilers and run-time systems. UTS performs parallel search of...
For most parallel and high performance systems, tuning guides provide the users with advices to optimize the execution time of their programs. Execution time may be very sensitive...
Storage mapping optimization is a flexible approach to folding array dimensions in numerical codes. It is designed to reduce the memory footprint after a wide spectrum of loop tr...
New media and signal processing applications demand ever higher performance while operating within the tight power constraints of mobile devices. A range of hardware implementatio...
Kevin Fan, Manjunath Kudlur, Ganesh S. Dasika, Sco...
In this paper, we take the idea of application-level processing on disks to one level further, and focus on an architecture, called Cluster of Active Disks (CAD), where the storag...