Sciweavers

22 search results - page 3 / 5
» Memory efficient parallel matrix multiplication operation fo...
Sort
View
PPOPP
2010
ACM
14 years 2 months ago
Scalable communication protocols for dynamic sparse data exchange
Many large-scale parallel programs follow a bulk synchronous parallel (BSP) structure with distinct computation and communication phases. Although the communication phase in such ...
Torsten Hoefler, Christian Siebert, Andrew Lumsdai...
ARC
2012
Springer
317views Hardware» more  ARC 2012»
12 years 1 months ago
A High Throughput FPGA-Based Implementation of the Lanczos Method for the Symmetric Extremal Eigenvalue Problem
Iterative numerical algorithms with high memory bandwidth requirements but medium-size data sets (matrix size ∼ a few 100s) are highly appropriate for FPGA acceleration. This pap...
Abid Rafique, Nachiket Kapre, George A. Constantin...
IPPS
1999
IEEE
13 years 9 months ago
Reducing I/O Complexity by Simulating Coarse Grained Parallel Algorithms
Block-wise access to data is a central theme in the design of efficient external memory (EM) algorithms. A second important issue, when more than one disk is present, is fully par...
Frank K. H. A. Dehne, David A. Hutchinson, Anil Ma...
CF
2009
ACM
13 years 12 months ago
Scheduling dynamic parallelism on accelerators
Resource management on accelerator based systems is complicated by the disjoint nature of the main CPU and accelerator, which involves separate memory hierarhcies, different degr...
Filip Blagojevic, Costin Iancu, Katherine A. Yelic...
CLUSTER
2005
IEEE
13 years 11 months ago
Memory Management Support for Multi-Programmed Remote Direct Memory Access (RDMA) Systems
Current operating systems offer basic support for network interface controllers (NICs) supporting remote direct memory access (RDMA). Such support typically consists of a device d...
Kostas Magoutis