Sciweavers

4198 search results - page 189 / 840
» Data Parallel Program Design
Sort
View
137
Voted
ICPP
2008
IEEE
15 years 9 months ago
Taming Single-Thread Program Performance on Many Distributed On-Chip L2 Caches
This paper presents a two-part study on managing distributed NUCA (Non-Uniform Cache Architecture) L2 caches in a future manycore processor to obtain high singlethread program per...
Lei Jin, Sangyeun Cho
IFL
2004
Springer
131views Formal Methods» more  IFL 2004»
15 years 8 months ago
Exploiting Single-Assignment Properties to Optimize Message-Passing Programs by Code Transformations
The message-passing paradigm is now widely accepted and used mainly for inter-process communication in distributed memory parallel systems. However, one of its disadvantages is the...
Alfredo Cristóbal-Salas, Andrey Chernykh, E...
HICSS
1995
IEEE
128views Biometrics» more  HICSS 1995»
15 years 6 months ago
Instruction Level Parallelism
Abstract. We reexamine the limits of parallelism available in programs, using runtime reconstruction of program data-flow graphs. While limits of parallelism have been examined in...
LCPC
2000
Springer
15 years 6 months ago
Automatic Coarse Grain Task Parallel Processing on SMP Using OpenMP
This paper proposes a simple and efficient implementation method for a hierarchical coarse grain task parallel processing scheme on a SMP machine. OSCAR multigrain parallelizing c...
Hironori Kasahara, Motoki Obata, Kazuhisa Ishizaka
IPPS
2010
IEEE
15 years 1 months ago
Out-of-core distribution sort in the FG programming environment
We describe the implementation of an out-of-core, distribution-based sorting program on a cluster using FG, a multithreaded programming framework. FG mitigates latency from disk-I/...
Priya Natarajan, Thomas H. Cormen, Elena Riccio St...