The growing dominance of wire delays at future technology points renders a microprocessor communication-bound. Clustered microarchitectures allow most dependence chains to execute...
While previous CPU- or memory-centric load balancing schemes are capable of achieving the effective usage of global CPU and memory resources in a cluster system, the cluster exhib...
Xiao Qin, Hong Jiang, Yifeng Zhu, David R. Swanson
In this paper, we describe a compilation system that automates much of the process of performance tuning that is currently done manually by application programmers interested in h...
Nastaran Baradaran, Jacqueline Chame, Chun Chen, P...
In this work, to render at least 5123 voxel volumes in real-time, we have developed a sort-last parallel volume rendering method for distributed memory multiprocessors. Our sort-l...
Percolation has recently been proposed as a key component of an advanced program execution model for future generation high-end machines featuring adaptive data/code transformatio...
Adeline Jacquet, Vincent Janot, Clement Leung, Gua...