Availability and performance can be improved using data replication as the basic technique, thus benefitting the services provided by a distributed system. However, replication im...
We are attacking the memory bottleneck by building a “smart” memory controller that improves effective memory bandwidth, bus utilization, and cache efficiency by letting appl...
Binu K. Mathew, Sally A. McKee, John B. Carter, Al...
In this paper, we propose a novel hardware caching technique, called switch directory, to reduce the communication latency in CC-NUMA multiprocessors. The main idea is to implemen...
Data locality and synchronization overhead are two important factors that affect the performance of applications on multiprocessors. Loop fusion is an effective way for reducing s...
Edwin Hsing-Mean Sha, Chenhua Lang, Nelson L. Pass...
Abstract--This paper proposes an analytical model to estimate the cost of running an affinity-based thread schedule on multicore systems. The model consists of three submodels to e...