For most parallel and high performance systems, tuning guides provide the users with advices to optimize the execution time of their programs. Execution time may be very sensitive...
Multi-core processors, with low communication costs and high availability of execution cores, will increase the use of execution and compilation models that use short threads to e...
We present an auto-tuning approach to optimize application performance on emerging multicore architectures. The methodology extends the idea of searchbased performance optimizatio...
Samuel Williams, Jonathan Carter, Leonid Oliker, J...
For many object-oriented systems, it is often useful to have a runtime architecture that shows networks of communicating objects. But it is hard to statically extract runtime obje...
We present a memory efficient, practical, systolic, parallel architecture for the complete 0/1 knapsack dynamic programming problem, including backtracking. This problem was inte...