Sciweavers

555 search results - page 16 / 111
» Efficient event-driven simulation of parallel processor arch...
Sort
View
SPAA
1992
ACM
15 years 1 months ago
Subset Barrier Synchronization on a Private-Memory Parallel System
A global barrier synchronizes all processors in a parallel system. This paper investigates algorithms that allow disjoint subsets of processors to synchronize independently and in...
Anja Feldmann, Thomas R. Gross, David R. O'Hallaro...
ARCS
2008
Springer
14 years 12 months ago
Hybrid Parallel Sort on the Cell Processor
: Sorting large data sets has always been an important application, and hence has been one of the benchmark applications on new parallel architectures. We present a parallel sortin...
Jörg Keller, Christoph W. Kessler, Kalle K&ou...
MICRO
2010
IEEE
149views Hardware» more  MICRO 2010»
14 years 7 months ago
Improving SIMT Efficiency of Global Rendering Algorithms with Architectural Support for Dynamic Micro-Kernels
Wide Single Instruction, Multiple Thread (SIMT) architectures often require a static allocation of thread groups that are executed in lockstep throughout the entire application ker...
Michael Steffen, Joseph Zambreno
TCAD
2002
104views more  TCAD 2002»
14 years 9 months ago
An instruction-level energy model for embedded VLIW architectures
In this paper, an instruction-level energy model is proposed for the data-path of very long instruction word (VLIW) pipelined processors that can be used to provide accurate power ...
Mariagiovanna Sami, Donatella Sciuto, Cristina Sil...
VRIPHYS
2010
14 years 4 months ago
Asynchronous Preconditioners for Efficient Solving of Non-linear Deformations
In this paper, we present a set of methods to improve numerical solvers, as used in real-time non-linear deformable models based on implicit integration schemes. The proposed appr...
Hadrien Courtecuisse, Jérémie Allard...