The excessive complexity of both machine architectures and applications have made it difficult for compilers to statically model and predict application behavior. This observatio...
Qing Yi, Keith Seymour, Haihang You, Richard W. Vu...
Bank locality can be defined as localizing the number of load/store accesses to a small set of memory banks at a given time. An optimizing compiler can modify a given input code t...
Guilin Chen, Mahmut T. Kandemir, Hendra Saputra, M...
Abstract We present an analog and machine-independent algebraic characterization of elementarily computable functions over the real numbers in the sense of recursive analysis: we p...
The most popular architecture for parallel search is work stealing: threads that have run out of work (nodes to be searched) steal from threads that still have work. Work stealing ...
: In scalable parallel machines, processors can make local memory accesses much faster than they can make remote memory accesses. In addition, when a number of remote accesses must...