Sciweavers

182 search results - page 32 / 37
» Scheduling complex streaming applications on the Cell proces...
Sort
View
78
Voted
HPCA
2006
IEEE
15 years 10 months ago
Store vectors for scalable memory dependence prediction and scheduling
Allowing loads to issue out-of-order with respect to earlier unresolved store addresses is very important for extracting parallelism in large-window superscalar processors. Blindl...
Samantika Subramaniam, Gabriel H. Loh
HPCA
2009
IEEE
15 years 10 months ago
Design and implementation of software-managed caches for multicores with local memory
Heterogeneous multicores, such as Cell BE processors and GPGPUs, typically do not have caches for their accelerator cores because coherence traffic, cache misses, and latencies fr...
Sangmin Seo, Jaejin Lee, Zehra Sura
PLDI
2012
ACM
13 years 2 days ago
Adaptive input-aware compilation for graphics engines
While graphics processing units (GPUs) provide low-cost and efficient platforms for accelerating high performance computations, the tedious process of performance tuning required...
Mehrzad Samadi, Amir Hormati, Mojtaba Mehrara, Jan...
IJPP
2006
145views more  IJPP 2006»
14 years 9 months ago
Deterministic Parallel Processing
Abstract. In order to address the problems faced in the wireless communications domain, picoChip has devised the picoArrayTM . The picoArrayTM is a tiled-processor architecture, co...
Gajinder Panesar, Daniel Towner, Andrew Duller, Al...
ASPLOS
2010
ACM
15 years 27 days ago
Micro-pages: increasing DRAM efficiency with locality-aware data placement
Power consumption and DRAM latencies are serious concerns in modern chip-multiprocessor (CMP or multi-core) based compute systems. The management of the DRAM row buffer can signif...
Kshitij Sudan, Niladrish Chatterjee, David Nellans...