Sciweavers

2155 search results - page 154 / 431
» The EM-X Parallel Computer: Architecture and Basic Performan...
Sort
View
IPPS
2008
IEEE
15 years 11 months ago
Faster matrix-vector multiplication on GeForce 8800GTX
Recently a GPU has acquired programmability to perform general purpose computation fast by running ten thousands of threads concurrently. This paper presents a new algorithm for d...
N. Fujimoto
154
Voted
LCN
2005
IEEE
15 years 10 months ago
On Reorder Density and its Application to Characterization of Packet Reordering
A formal approach for characterizing, evaluating and modeling packet reordering is presented. Reordering is, a phenomenon that is likely to become increasingly common on Internet,...
Nischal M. Piratla, Anura P. Jayasumana, Tarun Ban...
ICCAD
2005
IEEE
131views Hardware» more  ICCAD 2005»
16 years 1 months ago
Code restructuring for improving cache performance of MPSoCs
— One of the critical goals in code optimization for MPSoC architectures is to minimize the number of off-chip memory accesses. This is because such accesses can be extremely cos...
Guilin Chen, Mahmut T. Kandemir
HOTI
2005
IEEE
15 years 10 months ago
Zero Copy Sockets Direct Protocol over InfiniBand - Preliminary Implementation and Performance Analysis
Sockets Direct Protocol (SDP) is a byte-stream transport protocol implementing the TCP SOCK_STREAM semantics utilizing transport offloading capabilities of the InfiniBand fabric. ...
Dror Goldenberg, Michael Kagan, Ran Ravid, Michael...
IPPS
2002
IEEE
15 years 10 months ago
JMPI: Implementing the Message Passing Standard in Java
The Message Passing Interface (MPI) standard provides a uniform Application Programmers Interface (API) that abstracts the underlying hardware from the parallel applications. Rece...
Steven Morin, Israel Koren, C. Mani Krishna