Sciweavers

125 search results - page 25 / 25
» Loop Striping: Maximize Parallelism for Nested Loops
Sort
View
VLSID
2007
IEEE
133views VLSI» more  VLSID 2007»
14 years 5 months ago
On the Impact of Address Space Assignment on Performance in Systems-on-Chip
Today, VLSI systems for computationally demanding applications are being built as Systems-on-Chip (SoCs) with a distributed memory sub-system which is shared by a large number of ...
G. Hazari, Madhav P. Desai, H. Kasture
SIAMSC
2008
151views more  SIAMSC 2008»
13 years 5 months ago
Accurate Floating-Point Summation Part I: Faithful Rounding
Given a vector of floating-point numbers with exact sum s, we present an algorithm for calculating a faithful rounding of s, i.e. the result is one of the immediate floating-point ...
Siegfried M. Rump, Takeshi Ogita, Shin'ichi Oishi
CF
2009
ACM
13 years 12 months ago
Mapping the LU decomposition on a many-core architecture: challenges and solutions
Recently, multi-core architectures with alternative memory subsystem designs have emerged. Instead of using hardwaremanaged cache hierarchies, they employ software-managed embedde...
Ioannis E. Venetis, Guang R. Gao
IEEEPACT
2006
IEEE
13 years 11 months ago
Compiling for stream processing
This paper describes a compiler for stream programs that efficiently schedules computational kernels and stream memory operations, and allocates on-chip storage. Our compiler uses...
Abhishek Das, William J. Dally, Peter R. Mattson
SIAMSC
2008
168views more  SIAMSC 2008»
13 years 5 months ago
Accurate Floating-Point Summation Part II: Sign, K-Fold Faithful and Rounding to Nearest
In this Part II of this paper we first refine the analysis of error-free vector transformations presented in Part I. Based on that we present an algorithm for calculating the round...
Siegfried M. Rump, Takeshi Ogita, Shin'ichi Oishi