Sciweavers

30 search results - page 6 / 6
» Performance scalability of decoupled software pipelining
Sort
View
FCCM
2007
IEEE
165views VLSI» more  FCCM 2007»
13 years 7 months ago
Sparse Matrix-Vector Multiplication Design on FPGAs
Creating a high throughput sparse matrix vector multiplication (SpMxV) implementation depends on a balanced system design. In this paper, we introduce the innovative SpMxV Solver ...
Junqing Sun, Gregory D. Peterson, Olaf O. Storaasl...
ASPLOS
2004
ACM
13 years 11 months ago
FAB: building distributed enterprise disk arrays from commodity components
This paper describes the design, implementation, and evaluation of a Federated Array of Bricks (FAB), a distributed disk array that provides the reliability of traditional enterpr...
Yasushi Saito, Svend Frølund, Alistair C. V...
VLDB
2007
ACM
145views Database» more  VLDB 2007»
14 years 5 months ago
Executing Stream Joins on the Cell Processor
Low-latency and high-throughput processing are key requirements of data stream management systems (DSMSs). Hence, multi-core processors that provide high aggregate processing capa...
Bugra Gedik, Philip S. Yu, Rajesh Bordawekar
IPPS
2002
IEEE
13 years 10 months ago
Can User-Level Protocols Take Advantage of Multi-CPU NICs?
Modern high speed interconnects such as Myrinet and Gigabit Ethernet have shifted the bottleneck in communication from the interconnect to the messaging software at the sending an...
Piyush Shivam, Pete Wyckoff, Dhabaleswar K. Panda
PARA
1995
Springer
13 years 9 months ago
A Proposal for a Set of Parallel Basic Linear Algebra Subprograms
This paper describes a proposal for a set of Parallel Basic Linear Algebra Subprograms PBLAS. The PBLAS are targeted at distributed vector-vector, matrix-vector and matrixmatrix...
Jaeyoung Choi, Jack Dongarra, Susan Ostrouchov, An...