Sciweavers

2609 search results - page 330 / 522
» Optimizing for parallelism and data locality
Sort
View
FCCM
2006
IEEE
107views VLSI» more  FCCM 2006»
15 years 11 months ago
Hardware/Software Integration for FPGA-based All-Pairs Shortest-Paths
Field-Programmable Gate Arrays (FPGAs) are being employed in high performance computing systems owing to their potential to accelerate a wide variety of long-running routines. Par...
Uday Bondhugula, Ananth Devulapalli, James Dinan, ...
IPPS
2002
IEEE
15 years 10 months ago
Efficient Pipelining of Nested Loops: Unroll-and-Squash
The size and complexity of current custom VLSI have forced the use of high-level programming languages to describe hardware, and compiler and synthesis technology bstract designs ...
Darin Petkov, Randolph E. Harr, Saman P. Amarasing...
JACM
2006
98views more  JACM 2006»
15 years 5 months ago
Distribution sort with randomized cycling
Parallel independent disks can enhance the performance of external memory (EM) algorithms, but the programming task is often di cult. In this paper we develop randomized variants ...
Jeffrey Scott Vitter, David A. Hutchinson
ANOR
2010
130views more  ANOR 2010»
15 years 2 months ago
Greedy scheduling with custom-made objectives
We present a methodology to automatically generate an online job scheduling method for a custom-made objective and real workloads. The scheduling problem comprises independent para...
Carsten Franke, Joachim Lepping, Uwe Schwiegelshoh...
CAP
2010
15 years 4 days ago
Accuracy versus time: a case study with summation algorithms
In this article, we focus on numerical algorithms for which, in practice, parallelism and accuracy do not cohabit well. In order to increase parallelism, expressions are reparsed,...
Philippe Langlois, Matthieu Martel, Laurent Th&eac...