We present an auto-tuning approach to optimize application performance on emerging multicore architectures. The methodology extends the idea of searchbased performance optimizatio...
Samuel Williams, Jonathan Carter, Leonid Oliker, J...
For multi-gigahertz designs in nanometer technologies, data transfers on global interconnects take multiple clock cycles. In this paper, we propose a regular distributed register ...
This paper analyzes the impact of hardware multithreading support on the performance of distributed shared-memory DSM multiprocessors built out of heterogeneous, single-chip compu...
Renato J. O. Figueiredo, Jeffrey P. Bradford, Jos&...
We present a component-based framework and its supporting simulation tool for joint software-hardware modelling and performance analysis of multiprocessor embedded systems. This j...
: Sequences of data-dependent tasks, each one traversing large data sets, exist in many applications (such as video, image and signal processing applications). Those tasks usually ...