Sciweavers

472 search results - page 70 / 95
» Shared memory programming for large scale machines
Sort
View
SC
2000
ACM
15 years 2 months ago
Performance Modeling and Tuning of an Unstructured Mesh CFD Application
This paper describes performance tuning experiences with a three-dimensional unstructured grid Euler flow code from NASA, which we have reimplemented in the PETSc framework and p...
William Gropp, Dinesh K. Kaushik, David E. Keyes, ...
IPPS
2010
IEEE
14 years 7 months ago
Highly scalable parallel sorting
Sorting is a commonly used process with a wide breadth of applications in the high performance computing field. Early research in parallel processing has provided us with comprehen...
Edgar Solomonik, Laxmikant V. Kalé
PPL
2011
14 years 15 days ago
Mpi on millions of Cores
Petascale parallel computers with more than a million processing cores are expected to be available in a couple of years. Although MPI is the dominant programming interface today ...
Pavan Balaji, Darius Buntinas, David Goodell, Will...
ISCA
2005
IEEE
144views Hardware» more  ISCA 2005»
15 years 3 months ago
Scalable Load and Store Processing in Latency Tolerant Processors
Memory latency tolerant architectures support thousands of in-flight instructions without scaling cyclecritical processor resources, and thousands of useful instructions can compl...
Amit Gandhi, Haitham Akkary, Ravi Rajwar, Srikanth...
84
Voted
PLDI
2009
ACM
15 years 4 months ago
LiteRace: effective sampling for lightweight data-race detection
Data races are one of the most common and subtle causes of pernicious concurrency bugs. Static techniques for preventing data races are overly conservative and do not scale well t...
Daniel Marino, Madanlal Musuvathi, Satish Narayana...