Sciweavers

IEEEPACT
2006
IEEE
13 years 10 months ago
Overlapping dependent loads with addressless preload
Modern out-of-order processors with non-blocking caches exploit Memory-Level Parallelism (MLP) by overlapping cache misses in a wide instruction window. The exploitation of MLP, h...
Zhen Yang, Xudong Shi, Feiqi Su, Jih-Kwon Peir
IEEEPACT
2006
IEEE
13 years 10 months ago
Hardware support for spin management in overcommitted virtual machines
Multiprocessor operating systems (OSs) pose several unique and conflicting challenges to System Virtual Machines (System VMs). For example, most existing system VMs resort to gan...
Philip M. Wells, Koushik Chakraborty, Gurindar S. ...
IEEEPACT
2006
IEEE
13 years 10 months ago
Adaptive reorder buffers for SMT processors
In SMT processors, the complex interplay between private and shared datapath resources needs to be considered in order to realize the full performance potential. In this paper, we...
Joseph J. Sharkey, Deniz Balkan, Dmitry Ponomarev
IEEEPACT
2006
IEEE
13 years 10 months ago
Branch predictor guided instruction decoding
Fast instruction decoding is a challenge for the design of CISC microprocessors. A well-known solution to overcome this problem is using a trace cache. It stores and fetches alrea...
Oliverio J. Santana, Ayose Falcón, Alex Ram...
IEEEPACT
2006
IEEE
13 years 10 months ago
Region array SSA
Static Single Assignment (SSA) has become the intermediate program representation of choice in most modern compilers because it enables efficient data flow analysis of scalars an...
Silvius Rus, Guobin He, Christophe Alias, Lawrence...
IEEEPACT
2006
IEEE
13 years 10 months ago
Two-level mapping based cache index selection for packet forwarding engines
Packet forwarding is a memory-intensive application requiring multiple accesses through a trie structure. The efficiency of a cache for this application critically depends on the ...
Kaushik Rajan, Ramaswamy Govindarajan
IEEEPACT
2006
IEEE
13 years 10 months ago
Architectural support for operating system-driven CMP cache management
The role of the operating system (OS) in managing shared resources such as CPU time, memory, peripherals, and even energy is well motivated and understood [23]. Unfortunately, one...
Nauman Rafique, Won-Taek Lim, Mithuna Thottethodi
IEEEPACT
2006
IEEE
13 years 10 months ago
Fast, automatic, procedure-level performance tuning
This paper presents an automated performance tuning solution, which partitions a program into a number of tuning sections and finds the best combination of compiler options for e...
Zhelong Pan, Rudolf Eigenmann
IEEEPACT
2006
IEEE
13 years 10 months ago
Whole-program optimization of global variable layout
On machines with high-performance processors, the memory system continues to be a performance bottleneck. Compilers insert prefetch operations and reorder data accesses to improve...
Nathaniel McIntosh, Sandya Mannarswamy, Robert Hun...