Abstract—Widespread emergence of multicore processors will spur development of parallel applications, exposing programmers to degrees of hardware concurrency hitherto unavailable...
DTA (Decoupled Threaded Architecture) is designed to exploit fine/medium grained Thread Level Parallelism (TLP) by using a distributed hardware scheduling unit and relying on exi...
In this paper, we describe a compilation system that automates much of the process of performance tuning that is currently done manually by application programmers interested in h...
Nastaran Baradaran, Jacqueline Chame, Chun Chen, P...
PM-PVM is a portable implementation of PVM designed to work on SMP architectures supporting multithreading. PM-PVM portability is achieved through the implementation of the PVM fu...
LAPI is a low-level, high-performance communication interface available on the IBM RS/6000 SP system. It provides an activemessage-like interface along with remote memory copy and...
Gautam Shah, Jarek Nieplocha, Jamshed H. Mirza, Ch...