Sciweavers

1238 search results - page 220 / 248
» Power Efficient Processor Architecture and The Cell Processo...
Sort
View
EMSOFT
2008
Springer
15 years 1 months ago
A generalized static data flow clustering algorithm for mpsoc scheduling of multimedia applications
In this paper, we propose a generalized clustering approach for static data flow subgraphs mapped onto individual processors in Multi-Processor System on Chips (MPSoCs). The goal ...
Joachim Falk, Joachim Keinert, Christian Haubelt, ...
IPPS
2010
IEEE
14 years 9 months ago
Inter-block GPU communication via fast barrier synchronization
The graphics processing unit (GPU) has evolved from a fixedfunction processor with programmable stages to a programmable processor with many fixed-function components that deliver...
Shucai Xiao, Wu-chun Feng
DAC
1999
ACM
16 years 16 days ago
CAD Directions for High Performance Asynchronous Circuits
This paper describes a novel methodology for high performance asynchronous design based on timed circuits and on CAD support for their synthesis using Relative Timing. This method...
Ken S. Stevens, Shai Rotem, Steven M. Burns, Jordi...
HPCA
2008
IEEE
15 years 12 months ago
Serializing instructions in system-intensive workloads: Amdahl's Law strikes again
Serializing instructions (SIs), such as writes to control registers, have many complex dependencies, and are difficult to execute out-of-order (OoO). To avoid unnecessary complexi...
Philip M. Wells, Gurindar S. Sohi
SI3D
2010
ACM
15 years 6 months ago
Parallel Banding Algorithm to compute exact distance transform with the GPU
We propose a Parallel Banding Algorithm (PBA) on the GPU to compute the exact Euclidean Distance Transform (EDT) for a binary image in 2D and higher dimensions. Partitioning the i...
Thanh-Tung Cao, Ke Tang, Anis Mohamed, Tiow Seng T...