Sciweavers

20 search results - page 3 / 4
» MPC-MPI: An MPI Implementation Reducing the Overall Memory C...
Sort
View
PC
2006
124views Management» more  PC 2006»
13 years 6 months ago
Message-passing code generation for non-rectangular tiling transformations
Tiling is a well known loop transformation used to reduce communication overhead in distributed memory machines. Although a lot of theoretical research has been done concerning th...
Georgios I. Goumas, Nikolaos Drosinos, Maria Athan...
EUROPAR
2010
Springer
13 years 6 months ago
Optimized On-Chip-Pipelined Mergesort on the Cell/B.E
Abstract. Limited bandwidth to off-chip main memory is a performance bottleneck in chip multiprocessors for streaming computations, such as Cell/B.E., and this will become even mor...
Rikard Hultén, Christoph W. Kessler, Jö...
ANCS
2009
ACM
13 years 3 months ago
Range Tries for scalable address lookup
In this paper we introduce the Range Trie, a new multiway tree data structure for address lookup. Each Range Trie node maps to an address range [Na, Nb) and performs multiple comp...
Ioannis Sourdis, Georgios Stefanakis, Ruben de Sme...
DAC
2000
ACM
14 years 7 months ago
Code compression for low power embedded system design
erse approaches at all levels of abstraction starting from the physical level up to the system level. Experience shows that a highlevel method may have a larger impact since the de...
Haris Lekatsas, Jörg Henkel, Wayne Wolf
PPOPP
1997
ACM
13 years 10 months ago
Effective Fine-Grain Synchronization for Automatically Parallelized Programs Using Optimistic Synchronization Primitives
As shared-memory multiprocessors become the dominant commodity source of computation, parallelizing compilers must support mainstream computations that manipulate irregular, point...
Martin C. Rinard