Sciweavers

5640 search results - page 453 / 1128
» Parallelizing the Data Cube
Sort
View
IPPS
2010
IEEE
15 years 4 months ago
Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures
This work presents the first extensive study of singlenode performance optimization, tuning, and analysis of the fast multipole method (FMM) on modern multicore systems. We consid...
Aparna Chandramowlishwaran, Samuel Williams, Leoni...
IPPS
2010
IEEE
15 years 4 months ago
Operating system resource management
From the point of view of an operating system, a computer is managed and optimized in terms of the application programming model and the management of system resources. For the TF...
Burton Smith
EUROPAR
2011
Springer
14 years 6 months ago
Model-Driven Tile Size Selection for DOACROSS Loops on GPUs
DOALL loops are tiled to exploit DOALL parallelism and data locality on GPUs. In contrast, due to loop-carried dependences, DOACROSS loops must be skewed first in order to make ti...
Peng Di, Jingling Xue
PPAM
2005
Springer
15 years 11 months ago
Adapting Linear Algebra Codes to the Memory Hierarchy Using a Hypermatrix Scheme
Abstract. We present the way in which we adapt data and computations to the underlying memory hierarchy by means of a hierarchical data structure known as hypermatrix. The applicat...
José R. Herrero, Juan J. Navarro
ISLPED
2003
ACM
111views Hardware» more  ISLPED 2003»
15 years 11 months ago
A low-power VLSI architecture for turbo decoding
Presented in this paper is a low-power architecture for turbo decodings of parallel concatenated convolutional codes. The proposed architecture is derived via the concept of block...
Seok-Jun Lee, Naresh R. Shanbhag, Andrew C. Singer