Current data cache organizations fail to deliver high performance in scalar processors for many vector applications. There are two main reasons for this loss of performance: the u...
—Atomic operations are important building blocks in supporting general-purpose computing on graphics processing units (GPUs). For instance, they can be used to coordinate executi...
Various studies have shown that OS jitter can degrade parallel program performance considerably at large processor counts. Most sources of system jitter fall broadly into 5 catego...
We describe the Paraflow system for connecting heterogeneous computing services together into a flexible and efficient data-mining metacomputer. There are three levels of parallel...
Traditionally, scheduling in high-end parallel systems focuses on how to minimize the average job waiting time and on how to maximize the overall system utilization. Despite the d...