This paper describes a novel approach to generate an optimized schedule to run threads on distributed shared memory (DSM) systems. The approach relies upon a binary instrumentatio...
We present an auto-tuning approach to optimize application performance on emerging multicore architectures. The methodology extends the idea of searchbased performance optimizatio...
Samuel Williams, Jonathan Carter, Leonid Oliker, J...
In this paper, we overview general hardware architecture and a programming model of SRC-6ETM reconfigurable computers, and compare the performance of the SRC-6E machine vs. IntelĀ...
Osman Devrim Fidanci, Daniel S. Poznanovic, Kris G...
This paper describes a mechanism for automatic design and synthesis of very long instruction word (VLIW), and its generalization, explicitly parallel instruction computing rocesso...
We present an automatic skew mitigation approach for userdeļ¬ned MapReduce programs and present SkewTune, a system that implements this approach as a drop-in replacement for an e...
YongChul Kwon, Magdalena Balazinska, Bill Howe, Je...