Sciweavers

656 search results - page 85 / 132
» Scalable Parallel Matrix Multiplication on Distributed Memor...
Sort
View
67
Voted
IPPS
2005
IEEE
15 years 3 months ago
TiNy Threads: A Thread Virtual Machine for the Cyclops64 Cellular Architecture
This paper presents the design and implementation of a thread virtual machine, called TNT (or TiNy-Threads) for the IBM Cyclops64 architecture (the latest Cyclops architecture tha...
Juan del Cuvillo, Weirong Zhu, Ziang Hu, Guang R. ...
PPOPP
2005
ACM
15 years 3 months ago
Revocable locks for non-blocking programming
In this paper we present a new form of revocable lock that streamlines the construction of higher level concurrency abstractions such as atomic multi-word heap updates. The key id...
Tim Harris, Keir Fraser
IWOMP
2009
Springer
15 years 4 months ago
Scalability Evaluation of Barrier Algorithms for OpenMP
OpenMP relies heavily on barrier synchronization to coordinate the work of threads that are performing the computations in a parallel region. A good implementation of barriers is ...
Ramachandra C. Nanjegowda, Oscar Hernandez, Barbar...
ARC
2008
Springer
115views Hardware» more  ARC 2008»
14 years 11 months ago
A High Throughput FPGA-based Floating Point Conjugate Gradient Implementation
As Field Programmable Gate Arrays (FPGAs) have reached capacities beyond millions of equivalent gates, it becomes possible to accelerate floating-point scientific computing applica...
Antonio Roldao Lopes, George A. Constantinides
ICPP
2002
IEEE
15 years 2 months ago
Analysis of Memory Hierarchy Performance of Block Data Layout
Recently, several experimental studies have been conducted on block data layout as a data transformation technique used in conjunction with tiling to improve cache performance. In...
Neungsoo Park, Bo Hong, Viktor K. Prasanna