Programming network processors remains a challenging task since their birth until recently when high-level programming environments for them are emerging. By employing domain speci...
Tao Liu, Xiao-Feng Li, Lixia Liu, Chengyong Wu, Ro...
In this paper we present the internal representation and optimizations used by the CASH compiler for improving the memory parallelism of pointer-based programs. CASH uses an SSA-b...
Predicated execution can eliminate hard to predict branches and help to enable instruction level parallelism. Many current predication variants exist where the result update is co...
Increasing health care costs put a great strain on national economies. In recent years there have been several national and regional research and development projects in Finland a...
Despite large caches, main-memory access latencies still cause significant performance losses in many applications. Numerous hardware and software prefetching schemes tolerate th...
Zhenlin Wang, Doug Burger, Steven K. Reinhardt, Ka...