Abstract. We introduce a collection of high performance kernels for basic linear algebra. The kernels encapsulate small xed size computations in order to provide building blocks fo...
For decades, the serialization constraints imposed by true data dependences have been regarded as an absolute limit--the dataflow limit--on the parallel execution of serial progra...
A wad-free implementation of a data object in shared memory is one that guarantees that any process can complete any operation in a finite number of steps, regardless of the execu...
d abstract) Maurice Herlihy Digital Equipment Corporation Cambridge Research Laboratory One Kendall Square Cambridge MA, 02139 Digital Equipment Corporation Cambridge Research Lab ...
—While computing speed continues increasing rapidly, data-access technology is lagging behind. Data-access delay, not the processor speed, becomes the leading performance bottlen...