Sciweavers

131 search results - page 14 / 27
» Copy Elimination for Parallelizing Compilers
Sort
View
97
Voted
ICS
1994
Tsinghua U.
15 years 1 months ago
Evaluating automatic parallelization for efficient execution on shared-memory multiprocessors
We present a parallel code generation algorithm for complete applications and a new experimental methodology that tests the efficacy of our approach. The algorithm optimizes for d...
Kathryn S. McKinley
SIGCOMM
2009
ACM
15 years 4 months ago
Optimizing the BSD routing system for parallel processing
The routing architecture of the original 4.4BSD [3] kernel has been deployed successfully without major design modification for over 15 years. In the unified routing architectur...
Qing Li, Kip Macy
PLDI
2009
ACM
15 years 4 months ago
Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory
Multicore designs have emerged as the mainstream design paradigm for the microprocessor industry. Unfortunately, providing multiple cores does not directly translate into performa...
Mojtaba Mehrara, Jeff Hao, Po-Chun Hsu, Scott A. M...
73
Voted
EUROPAR
2000
Springer
15 years 1 months ago
Cache Remapping to Improve the Performance of Tiled Algorithms
With the increasing processing power, the latency of the memory hierarchy becomes the stumbling block of many modern computer architectures. In order to speed-up the calculations, ...
Kristof Beyls, Erik H. D'Hollander
IPPS
1996
IEEE
15 years 1 months ago
Efficient Run-Time Support for Irregular Task Computations with Mixed Granularities
Many irregular scientific computing problems can be modeled by directed acyclic task graphs (DAGs). In this paper, we present an efficient run-time system for executing general as...
Cong Fu, Tao Yang