While commodity computing and graphics hardware has increased in capacity and dropped in cost, it is still quite difficult to make effective use of such systems for general-purpos...
E. Wes Bethel, Greg Humphreys, Brian E. Paul, J. D...
In this paper, we describe a generalized approach to deriving a custom data layout in multiple memory banks for array-based computations, to facilitate high-bandwidth parallel mem...
To cope with the increasing difference between processor and main memory speeds, modern computer systems use deep memory hierarchies. In the presence of such hierarchies, the perf...
Margaret Martonosi, Anoop Gupta, Thomas E. Anderso...
There are several parallel programming models available for numerical computations at diļ¬erent levels of expressibility and ease of use. For the development of new domain speciļ¬...
āIn this paper, we introduce a runtime, nontrace-based algorithm to compute the critical path profile of the execution of message passing and shared-memory parallel programs. Our...