Sciweavers

3379 search results - page 423 / 676
» Parallel cross-entropy optimization
Sort
View
HPCA
2000
IEEE
15 years 7 months ago
Improving the Throughput of Synchronization by Insertion of Delays
Efficiency of synchronization mechanisms can limit the parallel performance of many shared-memory applications. In addition, the ever increasing performance gap between processor...
Ravi Rajwar, Alain Kägi, James R. Goodman
HPCA
2000
IEEE
15 years 7 months ago
Register Organization for Media Processing
Processor architectures with tens to hundreds of arithmetic units are emerging to handle media processing applications. These applications, such as image coding, image synthesis, ...
Scott Rixner, William J. Dally, Brucek Khailany, P...
IPPS
2000
IEEE
15 years 7 months ago
Reducing Ownership Overhead for Load-Store Sequences in Cache-Coherent Multiprocessors
Parallel programs that modify shared data in a cachecoherent multiprocessor with a write-invalidate coherence protocol create ownership overhead in the form of ownership acquisiti...
Jim Nilsson, Fredrik Dahlgren
PPOPP
1999
ACM
15 years 7 months ago
MagPIe: MPI's Collective Communication Operations for Clustered Wide Area Systems
Writing parallel applications for computational grids is a challenging task. To achieve good performance, algorithms designed for local area networks must be adapted to the differ...
Thilo Kielmann, Rutger F. H. Hofman, Henri E. Bal,...
IPPS
1998
IEEE
15 years 7 months ago
Partitioned Schedules for Clustered VLIW Architectures
This paper presents results on a new approach to partitioning a modulo-scheduled loop for distributed execution on parallel clusters of functional units organized as a VLIW machin...
Marcio Merino Fernandes, Josep Llosa, Nigel P. Top...