Sciweavers

567 search results - page 28 / 114
» Program Optimization and Parallelization Using Idioms
Sort
View
PLDI
2012
ACM
13 years 5 days ago
Adaptive input-aware compilation for graphics engines
While graphics processing units (GPUs) provide low-cost and efficient platforms for accelerating high performance computations, the tedious process of performance tuning required...
Mehrzad Samadi, Amir Hormati, Mojtaba Mehrara, Jan...
IEEEPACT
2005
IEEE
15 years 3 months ago
Characterization of TCC on Chip-Multiprocessors
Transactional Coherence and Consistency (TCC) is a novel coherence scheme for shared memory multiprocessors that uses programmer-defined transactions as the fundamental unit of p...
Austen McDonald, JaeWoong Chung, Hassan Chafi, Chi...
EUROPAR
2010
Springer
14 years 10 months ago
Optimized Dense Matrix Multiplication on a Many-Core Architecture
Abstract. Traditional parallel programming methodologies for improving performance assume cache-based parallel systems. However, new architectures, like the IBM Cyclops-64 (C64), b...
Elkin Garcia, Ioannis E. Venetis, Rishi Khan, Guan...
CPHYSICS
2006
204views more  CPHYSICS 2006»
14 years 9 months ago
Genetically controlled random search: a global optimization method for continuous multidimensional functions
A new stochastic method for locating the global minimum of a multidimensional function inside a rectangular hyperbox is presented. A sampling technique is employed that makes use ...
Ioannis G. Tsoulos, Isaac E. Lagaris
SOFTVIS
2003
ACM
15 years 3 months ago
Interactive Locality Optimization on NUMA Architectures
Optimizing the performance of shared-memory NUMA programs remains something of a black art, requiring that application writers possess deep understanding of their programs’ beha...
Tao Mu, Jie Tao, Martin Schulz, Sally A. McKee