Sciweavers

567 search results - page 28 / 114
» Program Optimization and Parallelization Using Idioms
Sort
View
PLDI
2012
ACM
13 years 8 months ago
Adaptive input-aware compilation for graphics engines
While graphics processing units (GPUs) provide low-cost and efficient platforms for accelerating high performance computations, the tedious process of performance tuning required...
Mehrzad Samadi, Amir Hormati, Mojtaba Mehrara, Jan...
IEEEPACT
2005
IEEE
15 years 11 months ago
Characterization of TCC on Chip-Multiprocessors
Transactional Coherence and Consistency (TCC) is a novel coherence scheme for shared memory multiprocessors that uses programmer-defined transactions as the fundamental unit of p...
Austen McDonald, JaeWoong Chung, Hassan Chafi, Chi...
EUROPAR
2010
Springer
15 years 7 months ago
Optimized Dense Matrix Multiplication on a Many-Core Architecture
Abstract. Traditional parallel programming methodologies for improving performance assume cache-based parallel systems. However, new architectures, like the IBM Cyclops-64 (C64), b...
Elkin Garcia, Ioannis E. Venetis, Rishi Khan, Guan...
CPHYSICS
2006
204views more  CPHYSICS 2006»
15 years 6 months ago
Genetically controlled random search: a global optimization method for continuous multidimensional functions
A new stochastic method for locating the global minimum of a multidimensional function inside a rectangular hyperbox is presented. A sampling technique is employed that makes use ...
Ioannis G. Tsoulos, Isaac E. Lagaris
SOFTVIS
2003
ACM
15 years 11 months ago
Interactive Locality Optimization on NUMA Architectures
Optimizing the performance of shared-memory NUMA programs remains something of a black art, requiring that application writers possess deep understanding of their programs’ beha...
Tao Mu, Jie Tao, Martin Schulz, Sally A. McKee