The Cray X1 was recently introduced as the first in a new line of parallel systems to combine high-bandwidth vector processing with an MPP system architecture. Alongside capabili...
Christian Bell, Wei-Yu Chen, Dan Bonachea, Katheri...
Until recently, a steadily rising clock rate and other uniprocessor microarchitectural improvements could be relied upon to consistently deliver increasing performance for a wide ...
Guilherme Ottoni, Ram Rangan, Adam Stoler, David I...
Abstract. Empirical optimizers like ATLAS have been very effective in optimizing computational kernels in libraries. The best choice of parameters such as tile size and degree of l...
This paper presents a new operation chaining reconfigurable scheduling algorithm (CRS) based on list scheduling that maximizes instruction level parallelism available in distribut...
Ying Yi, Ioannis Nousias, Mark Milward, Sami Khawa...
Field programmable gate arrays (FPGAs), graphics processing units (GPUs) and Sony’s Playstation 2 vector units offer scope for hardware acceleration of applications. Implementin...
Lee W. Howes, Paul Price, Oskar Mencer, Olav Beckm...