Run-time parallelization is often the only way to execute the code in parallel when data dependence information is incomplete at compile time. This situation is common in many imp...
A key challenge in achieving high performance on software DSM systems is overcoming their relatively large communication latencies. In this paper, we consider two techniques which...
We present a unified approach to locality optimization that employs both data and control transformations. Data transformations include changing the array layout in memory. Contr...
—The potential for destructive interference between running processes is increased as Chip Multiprocessors (CMPs) share more on-chip resources. We believe that understanding the ...
- Processor scheduling in distributed-memory systems has received considerable attention in recent years. Several commercial distributed-memory systems use spacesharing processor s...