Sciweavers

111 search results - page 18 / 23
» Extending SRT for parallel applications in tiled-CMP archite...
Sort
View
101
Voted
EUROPAR
2005
Springer
15 years 3 months ago
Hierarchical Scheduling for Moldable Tasks
The model of moldable task (MT) was introduced some years ago and has been proved to be an efficient way for implementing parallel applications. It considers a target application ...
Pierre-François Dutot
HPCA
2009
IEEE
15 years 10 months ago
Design and implementation of software-managed caches for multicores with local memory
Heterogeneous multicores, such as Cell BE processors and GPGPUs, typically do not have caches for their accelerator cores because coherence traffic, cache misses, and latencies fr...
Sangmin Seo, Jaejin Lee, Zehra Sura
ISCA
2008
IEEE
148views Hardware» more  ISCA 2008»
15 years 4 months ago
Atomic Vector Operations on Chip Multiprocessors
The current trend is for processors to deliver dramatic improvements in parallel performance while only modestly improving serial performance. Parallel performance is harvested th...
Sanjeev Kumar, Daehyun Kim, Mikhail Smelyanskiy, Y...
SOSP
1997
ACM
14 years 11 months ago
Towards Transparent and Efficient Software Distributed Shared Memory
Despite a large research effort, software distributed shared memory systems have not been widely used to run parallel applications across clusters of computers. The higher perform...
Daniel J. Scales, Kourosh Gharachorloo
CODES
2007
IEEE
15 years 4 months ago
A code-generator generator for multi-output instructions
We address the problem of instruction selection for Multi-Output Instructions (MOIs), producing more than one result. Such inherently parallel hardware instructions are very commo...
Hanno Scharwächter, Jonghee M. Youn, Rainer L...