Abstract—This paper proposes a new software-oriented approach for managing the distributed shared L2 caches of a chip multiprocessor (CMP) for latency-oriented multithreaded appl...
Store misses cause significant delays in shared-memory multiprocessors because of limited store buffering and ordering constraints required for proper synchronization. Today, prog...
Thomas F. Wenisch, Anastassia Ailamaki, Babak Fals...
During the past few years, embedded digital systems have been requested to provide a huge amount of processing power and functionality. A very likely foreseeable step to pursue th...
It has been shown that a small number of FPGAs can significantly accelerate certain computing tasks by up to two or three orders of magnitude. However, particularly intensive lar...
Arun Patel, Christopher A. Madill, Manuel Salda&nt...
Synchronization algorithms that are efficient across a wide range of applications and operating conditions are hard to design because their performance depends on unpredictable ru...