To execute a shared memory program efficiently, we have to manage memory consistency with low overheads, and have to utilize communication bandwidth of the platform as much as pos...
Global locality analysis is a technique for improving the cache performance of a sequence of loop nests through a combination of loop and data layout optimizations. Pure loop tran...
Mahmut T. Kandemir, Alok N. Choudhary, J. Ramanuja...
A set of synchronization relations between distributed nonatomic events was recently proposed to provide real-time applications with a fine level of discrimination in the specifica...
Media applications are characterized by large amounts of available parallelism, little data reuse, and a high computation to memory access ratio. While these characteristics are p...
Scott Rixner, William J. Dally, Ujval J. Kapasi, B...
In previous work, we have investigated real coded genetic algorithms with several types of multi-parent recombination operators and found evidence that multi-parent recombination w...