Sciweavers

21 search results - page 2 / 5
» Extending Automatic Parallelization to Optimize High-Level A...
Sort
View
IISWC
2009
IEEE
14 years 17 hour ago
Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system
Abstract—Dynamic runtimes can simplify parallel programming by automatically managing concurrency and locality without further burdening the programmer. Nevertheless, implementin...
Richard M. Yoo, Anthony Romano, Christos Kozyrakis
ICS
2009
Tsinghua U.
14 years 4 days ago
Computer generation of fast fourier transforms for the cell broadband engine
The Cell BE is a multicore processor with eight vector accelerators (called SPEs) that implement explicit cache management through direct memory access engines. While the Cell has...
Srinivas Chellappa, Franz Franchetti, Markus P&uum...
IEEEPACT
2009
IEEE
14 years 17 hour ago
Interprocedural Load Elimination for Dynamic Optimization of Parallel Programs
Abstract—Load elimination is a classical compiler transformation that is increasing in importance for multi-core and many-core architectures. The effect of the transformation is ...
Rajkishore Barik, Vivek Sarkar
FCCM
2011
IEEE
331views VLSI» more  FCCM 2011»
12 years 9 months ago
Synthesis of Platform Architectures from OpenCL Programs
—The problem of automatically generating hardware modules from a high level representation of an application has been at the research forefront in the last few years. In this pap...
Muhsen Owaida, Nikolaos Bellas, Konstantis Dalouka...
HPCA
2009
IEEE
14 years 5 months ago
Design and implementation of software-managed caches for multicores with local memory
Heterogeneous multicores, such as Cell BE processors and GPGPUs, typically do not have caches for their accelerator cores because coherence traffic, cache misses, and latencies fr...
Sangmin Seo, Jaejin Lee, Zehra Sura