Sciweavers

45 search results - page 7 / 9
» Cache Performance Optimizations for Parallel Lattice Boltzma...
Sort
View
IPPS
2006
IEEE
15 years 3 months ago
Conjugate gradient sparse solvers: performance-power characteristics
We characterize the performance and power attributes of the conjugate gradient (CG) sparse solver which is widely used in scientific applications. We use cycle-accurate simulatio...
Konrad Malkowski, Ingyu Lee, Padma Raghavan, Mary ...
ICPP
1999
IEEE
15 years 1 months ago
A Framework for Interprocedural Locality Optimization Using Both Loop and Data Layout Transformations
There has been much work recently on improving the locality performance of loop nests in scientific programs through the use of loop as well as data layout optimizations. However,...
Mahmut T. Kandemir, Alok N. Choudhary, J. Ramanuja...
ICS
2009
Tsinghua U.
15 years 4 months ago
Computer generation of fast fourier transforms for the cell broadband engine
The Cell BE is a multicore processor with eight vector accelerators (called SPEs) that implement explicit cache management through direct memory access engines. While the Cell has...
Srinivas Chellappa, Franz Franchetti, Markus P&uum...
87
Voted
PPOPP
2003
ACM
15 years 2 months ago
Exploiting high-level coherence information to optimize distributed shared state
InterWeave is a distributed middleware system that supports the sharing of strongly typed, pointer-rich data structures across a wide variety of hardware architectures, operating ...
DeQing Chen, Chunqiang Tang, Brandon Sanders, Sand...
ICPP
1999
IEEE
15 years 1 months ago
Optimization of Instruction Fetch for Decision Support Workloads
Instruction fetch bandwidth is feared to be a major limiting factor to the performance of future wide-issue aggressive superscalars. In this paper, we focus on Database applicatio...
Alex Ramírez, Josep-Lluis Larriba-Pey, Carl...