Sciweavers

2374 search results - page 318 / 475
» Fast Distributed Algorithms for Computing Separable Function...
Sort
View
138
Voted
EUROPAR
2011
Springer
14 years 2 months ago
A Fully Empirical Autotuned Dense QR Factorization for Multicore Architectures
: Tuning numerical libraries has become more difficult over time, as systems get more sophisticated. In particular, modern multicore machines make the behaviour of algorithms hard ...
Emmanuel Agullo, Jack Dongarra, Rajib Nath, Stanim...
148
Voted
HPCC
2011
Springer
14 years 2 months ago
Heuristic-Based Techniques for Mapping Irregular Communication Graphs to Mesh Topologies
— Mapping of parallel applications on the network topology is becoming increasingly important on large supercomputers. Topology aware mapping can reduce the hops traveled by mess...
Abhinav Bhatele, Laxmikant V. Kalé
124
Voted
IPPS
2000
IEEE
15 years 7 months ago
Augmenting Modern Superscalar Architectures with Configurable Extended Instructions
The instruction sets of general-purpose microprocessors are designed to offer good performance across a wide range of programs. The size and complexity of the instruction sets, how...
Xianfeng Zhou, Margaret Martonosi
115
Voted
GRID
2007
Springer
15 years 8 months ago
Grid-based asynchronous replica exchange
— Replica exchange is a powerful sampling algorithm and can be effectively used for applications such as simulating the structure, function, folding, and dynamics of proteins and...
Zhen Li, Manish Parashar
ANCS
2005
ACM
15 years 8 months ago
SSA: a power and memory efficient scheme to multi-match packet classification
New network applications like intrusion detection systems and packet-level accounting require multi-match packet classification, where all matching filters need to be reported. Te...
Fang Yu, T. V. Lakshman, Martin Austin Motoyama, R...