Sciweavers

2609 search results - page 325 / 522
» Optimizing for parallelism and data locality
Sort
View
IWMM
2011
Springer
270views Hardware» more  IWMM 2011»
14 years 8 months ago
Memory management in NUMA multicore systems: trapped between cache contention and interconnect overhead
Multiprocessors based on processors with multiple cores usually include a non-uniform memory architecture (NUMA); even current 2-processor systems with 8 cores exhibit non-uniform...
Zoltan Majo, Thomas R. Gross
PDP
2008
IEEE
15 years 11 months ago
Out-of-Core Wavefront Computations with Reduced Synchronization
Matrix computation algorithms often exhibit dependencies between neighboring elements inside loop nests such that the frontier between computed elements and those to be computed w...
Pierre-Nicolas Clauss, Jens Gustedt, Fréd&e...
143
Voted
CCGRID
2008
IEEE
15 years 11 months ago
Joint Communication and Computation Task Scheduling in Grids
In this paper we present a multicost algorithm for the joint time scheduling of the communication and computation resources that will be used by a task. The proposed algorithm sel...
Kostas Christodoulopoulos, Nikolaos D. Doulamis, E...
ICPADS
2006
IEEE
15 years 11 months ago
iDIBS: An Improved Distributed Backup System
iDIBS is a peer-to-peer backup system which optimizes the Distributed Internet Backup System (DIBS). iDIBS offers increased reliability by enhancing the robustness of existing pac...
Faruck Morcos, Thidapat Chantem, Philip Little, Ti...
ICS
1999
Tsinghua U.
15 years 9 months ago
Software trace cache
—This paper explores the use of compiler optimizations which optimize the layout of instructions in memory. The target is to enable the code to make better use of the underlying ...
Alex Ramírez, Josep-Lluis Larriba-Pey, Carl...