This work presents the first extensive study of singlenode performance optimization, tuning, and analysis of the fast multipole method (FMM) on modern multicore systems. We consid...
Aparna Chandramowlishwaran, Samuel Williams, Leoni...
From the point of view of an operating system, a computer is managed and optimized in terms of the application programming model and the management of system resources. For the TF...
DOALL loops are tiled to exploit DOALL parallelism and data locality on GPUs. In contrast, due to loop-carried dependences, DOACROSS loops must be skewed first in order to make ti...
Abstract. We present the way in which we adapt data and computations to the underlying memory hierarchy by means of a hierarchical data structure known as hypermatrix. The applicat...
Presented in this paper is a low-power architecture for turbo decodings of parallel concatenated convolutional codes. The proposed architecture is derived via the concept of block...
Seok-Jun Lee, Naresh R. Shanbhag, Andrew C. Singer