Sciweavers

779 search results - page 147 / 156
» A Simple Program Transformation for Parallelism
Sort
View
ASPLOS
2010
ACM
15 years 4 months ago
Flexible architectural support for fine-grain scheduling
To make efficient use of CMPs with tens to hundreds of cores, it is often necessary to exploit fine-grain parallelism. However, managing tasks of a few thousand instructions is ...
Daniel Sanchez, Richard M. Yoo, Christos Kozyrakis
91
Voted
PLDI
2005
ACM
15 years 3 months ago
Register allocation for software pipelined multi-dimensional loops
Software pipelining of a multi-dimensional loop is an important optimization that overlaps the execution of successive outermost loop iterations to explore instruction-level paral...
Hongbo Rong, Alban Douillet, Guang R. Gao
ISCAPDCS
2001
14 years 11 months ago
End-user Tools for Application Performance Analysis Using Hardware Counters
One purpose of the end-user tools described in this paper is to give users a graphical representation of performance information that has been gathered by instrumenting an applica...
Kevin S. London, Jack Dongarra, Shirley Moore, Phi...
JSA
2000
116views more  JSA 2000»
14 years 9 months ago
Distributed vector architectures
Integrating processors and main memory is a promising approach to increase system performance. Such integration provides very high memory bandwidth that can be exploited efficientl...
Stefanos Kaxiras
PPL
2011
14 years 9 days ago
Mpi on millions of Cores
Petascale parallel computers with more than a million processing cores are expected to be available in a couple of years. Although MPI is the dominant programming interface today ...
Pavan Balaji, Darius Buntinas, David Goodell, Will...