Tuning a configurable cache subsystem to an application can greatly reduce memory hierarchy energy consumption. Previous tuning methods use a level one configurable cache only, or...
The Spidergon interconnection network has become popular recently in multiprocessor systems on chips. To the best of our knowledge, algorithms for collective communications (CC) h...
Abstract. Finite volume numerical methods have been widely studied, implemented and parallelized on multiprocessor systems or on clusters. Modern graphics processing units (GPU) pr...
In the light of image retrieval evolving from text annotation to content-based and from standalone applications to web-based search engines, we foresee the need for deploying cont...
Computer architects are constantly faced with the need to improve performance and increase the efficiency of computation in their designs. To this end, it is increasingly common ...
Nathan Clark, Amir Hormati, Scott A. Mahlke, Sami ...