Sciweavers

4934 search results - page 792 / 987
» Implementing an API for Distributed Adaptive Computing Syste...
Sort
View
SPAA
1999
ACM
15 years 6 months ago
Recursive Array Layouts and Fast Parallel Matrix Multiplication
Matrix multiplication is an important kernel in linear algebra algorithms, and the performance of both serial and parallel implementations is highly dependent on the memory system...
Siddhartha Chatterjee, Alvin R. Lebeck, Praveen K....
IPPS
1998
IEEE
15 years 6 months ago
Efficient Runtime Thread Management for the Nano-Threads Programming Model
Abstract. The nano-threads programming model was proposed to effectively integrate multiprogramming on shared-memory multiprocessors, with the exploitation of fine-grain parallelis...
Dimitrios S. Nikolopoulos, Eleftherios D. Polychro...
HPCA
1996
IEEE
15 years 6 months ago
Register File Design Considerations in Dynamically Scheduled Processors
We have investigated the register file requirements of dynamically scheduled processors using register renaming and dispatch queues running the SPEC92 benchmarks. We looked at pro...
Keith I. Farkas, Norman P. Jouppi, Paul Chow
HPCA
1996
IEEE
15 years 6 months ago
Protected, User-Level DMA for the SHRIMP Network Interface
Traditional DMA requires the operating system to perform many tasks to initiate a transfer, with overhead on the order of hundreds or thousands of CPU instructions. This paper des...
Matthias A. Blumrich, Cezary Dubnicki, Edward W. F...
CONCUR
2008
Springer
15 years 4 months ago
Dynamic Partial Order Reduction Using Probe Sets
We present an algorithm for partial order reduction in the context of a countable universe of deterministic actions, of which finitely many are enabled at any given state. This mea...
Harmen Kastenberg, Arend Rensink