Communication latencies within critical sections constitute a major bottleneck in some classes of emerging parallel workloads. In this paper, we argue for the use of Inferentially...
Conventional programming models were designed to be used by expert programmers for programming for largescale multiprocessors, distributed computational clusters, or specialized p...
We describe the Slice Processor micro-architecture that implements a generalized operation-based prefetching mechanism. Operation-based prefetchers predict the series of operation...
Andreas Moshovos, Dionisios N. Pnevmatikatos, Amir...
Abstract—Sharing patterns in shared-memory multiprocessors are the key to performance: uniprocessor latencytolerating techniques such as out-of-order execution and non-blocking c...
Abstract— Content replication and distribution is an effective technology to reduce the response time for web accesses and has been proven quite popular among large Internet cont...