Sciweavers

7 search results - page 1 / 2
» High-Performance FPGA-Based General Reduction Methods
Sort
View
FCCM
2005
IEEE
106views VLSI» more  FCCM 2005»
13 years 10 months ago
High-Performance FPGA-Based General Reduction Methods
FPGA-based floating-point kernels must exploit algorithmic parallelism and use deeply pipelined cores to gain a performance advantage over general-purpose processors. Inability t...
Gerald R. Morris, Ling Zhuo, Viktor K. Prasanna
CCGRID
2007
IEEE
13 years 10 months ago
High-Performance MPI Broadcast Algorithm for Grid Environments Utilizing Multi-lane NICs
The performance of MPI collective operations, such as broadcast and reduction, is heavily affected by network topologies, especially in grid environments. Many techniques to cons...
Tatsuhiro Chiba, Toshio Endo, Satoshi Matsuoka
TPDS
2010
174views more  TPDS 2010»
13 years 2 months ago
Parallel Two-Sided Matrix Reduction to Band Bidiagonal Form on Multicore Architectures
The objective of this paper is to extend, in the context of multicore architectures, the concepts of tile algorithms [Buttari et al., 2007] for Cholesky, LU, QR factorizations to t...
Hatem Ltaief, Jakub Kurzak, Jack Dongarra
SBCCI
2005
ACM
111views VLSI» more  SBCCI 2005»
13 years 10 months ago
Total leakage power optimization with improved mixed gates
Gate oxide tunneling current Igate and sub-threshold current Isub dominate the leakage of designs. The latter depends on threshold voltage Vth while Igate vary with the thickness ...
Frank Sill, Frank Grassert, Dirk Timmermann
DAC
2010
ACM
13 years 2 months ago
Non-uniform clock mesh optimization with linear programming buffer insertion
Clock meshes are extremely effective at filtering clock skew from environmental and process variations. For this reason, clock meshes are used in most high performance designs. Ho...
Matthew R. Guthaus, Gustavo Wilke, Ricardo Reis