Performance loss due to long-latency memory accesses can be reduced by servicing multiple memory accesses concurrently. The notion of generating and servicing long-latency cache m...
Moinuddin K. Qureshi, Daniel N. Lynch, Onur Mutlu,...
good data partitioning scheme is the need of the time. However it is very diflcult to arrive at a good solution as the number of possible dutupartitionsfor a given real lifeprogra...
The Cray X1 was recently introduced as the first in a new line of parallel systems to combine high-bandwidth vector processing with an MPP system architecture. Alongside capabili...
Christian Bell, Wei-Yu Chen, Dan Bonachea, Katheri...
OpenMP has emerged as a widely accepted standard for writing shared memory programs. Hardware-specific extensions such as data placement are usually needed to improve the scalabi...
We discuss here a parallel implementation of the visualisation of data from a galaxy formation simulation within the Triana problem-solving environment. The visualisation is a tes...
Ian J. Taylor, Matthew S. Shields, Ian Wang, Roger...