Multimedia instruction set extensions have become a prominent feature in desktop microprocessor platforms, promising superior performance on a wide range of floating-point and int...
This paper presents a novel technique to perform global optimization of communication and preprocessing calls in the presence of array accesses with arbitrary subscripts. Our sche...
Abstract--The most popular representative devices of reconfigurable computing are the Field Programmable Gate Arrays (FPGAs). A promising feature of an FPGA is the ability to reuse...
Concurrent programming errors arise when threads share data incorrectly. Programmers often avoid these errors by using synchronization to enforce a simple ownership policy: data i...
Jean-Phillipe Martin, Michael Hicks, Manuel Costa,...
The power-efficient massively parallel Graphics Processing Units (GPUs) have become increasingly influential for scientific computing over the past few years. However, their ef...
Eddy Z. Zhang, Yunlian Jiang, Ziyu Guo, Kai Tian, ...