Sciweavers

125 search results - page 25 / 25
» Loop Striping: Maximize Parallelism for Nested Loops
Sort
View
VLSID
2007
IEEE
133views VLSI» more  VLSID 2007»
15 years 10 months ago
On the Impact of Address Space Assignment on Performance in Systems-on-Chip
Today, VLSI systems for computationally demanding applications are being built as Systems-on-Chip (SoCs) with a distributed memory sub-system which is shared by a large number of ...
G. Hazari, Madhav P. Desai, H. Kasture
SIAMSC
2008
151views more  SIAMSC 2008»
14 years 10 months ago
Accurate Floating-Point Summation Part I: Faithful Rounding
Given a vector of floating-point numbers with exact sum s, we present an algorithm for calculating a faithful rounding of s, i.e. the result is one of the immediate floating-point ...
Siegfried M. Rump, Takeshi Ogita, Shin'ichi Oishi
CF
2009
ACM
15 years 4 months ago
Mapping the LU decomposition on a many-core architecture: challenges and solutions
Recently, multi-core architectures with alternative memory subsystem designs have emerged. Instead of using hardwaremanaged cache hierarchies, they employ software-managed embedde...
Ioannis E. Venetis, Guang R. Gao
IEEEPACT
2006
IEEE
15 years 4 months ago
Compiling for stream processing
This paper describes a compiler for stream programs that efficiently schedules computational kernels and stream memory operations, and allocates on-chip storage. Our compiler uses...
Abhishek Das, William J. Dally, Peter R. Mattson
SIAMSC
2008
168views more  SIAMSC 2008»
14 years 10 months ago
Accurate Floating-Point Summation Part II: Sign, K-Fold Faithful and Rounding to Nearest
In this Part II of this paper we first refine the analysis of error-free vector transformations presented in Part I. Based on that we present an algorithm for calculating the round...
Siegfried M. Rump, Takeshi Ogita, Shin'ichi Oishi