Sciweavers

IRREGULAR
1995
Springer
13 years 7 months ago
Run-Time Parallelization of Irregular DOACROSS Loops
Dependencies between iterations of loop structures cannot always be determined at compile-time because they may depend on input data which is known only at run-time. A prime examp...
V. Prasad Krothapalli, Thulasiraman Jeyaraman, Mar...
CGO
2004
IEEE
13 years 8 months ago
Single-Dimension Software Pipelining for Multi-Dimensional Loops
Traditionally, software pipelining is applied either to the innermost loop of a given loop nest or from the innermost loop to outer loops. In this paper, we propose a threestep ap...
Hongbo Rong, Zhizhong Tang, Ramaswamy Govindarajan...
CF
2007
ACM
13 years 8 months ago
Identifying potential parallelism via loop-centric profiling
The transition to multithreaded, multi-core designs places a greater responsibility on programmers and software for improving performance; thread-level parallelism (TLP) will be i...
Tipp Moseley, Daniel A. Connors, Dirk Grunwald, Ra...
ASPLOS
1994
ACM
13 years 8 months ago
Compiler Optimizations for Improving Data Locality
In the past decade, processor speed has become significantly faster than memory speed. Small, fast cache memories are designed to overcome this discrepancy, but they are only effe...
Steve Carr, Kathryn S. McKinley, Chau-Wen Tseng
IEEEPACT
1998
IEEE
13 years 8 months ago
A Matrix-Based Approach to the Global Locality Optimization Problem
Global locality analysis is a technique for improving the cache performance of a sequence of loop nests through a combination of loop and data layout optimizations. Pure loop tran...
Mahmut T. Kandemir, Alok N. Choudhary, J. Ramanuja...
IPPS
1999
IEEE
13 years 8 months ago
Cascaded Execution: Speeding Up Unparallelized Execution on Shared-Memory Multiprocessors
Both inherently sequential code and limitations of analysis techniques prevent full parallelization of many applications by parallelizing compilers. Amdahl's Law tells us tha...
Ruth E. Anderson, Thu D. Nguyen, John Zahorjan
ICPP
1999
IEEE
13 years 8 months ago
Access Descriptor Based Locality Analysis for Distributed-Shared Memory Multiprocessors
Most of today's multiprocessors have a DistributedShared Memory (DSM) organization, which enables scalability while retaining the convenience of the shared-memory programming...
Angeles G. Navarro, Rafael Asenjo, Emilio L. Zapat...
ICPP
1999
IEEE
13 years 8 months ago
Compiler Optimizations for I/O-Intensive Computations
This paper describes transformation techniques for out-of-core programs (i.e., those that deal with very large quantities of data) based on exploiting locality using a combination...
Mahmut T. Kandemir, Alok N. Choudhary, J. Ramanuja...
IPPS
2003
IEEE
13 years 9 months ago
Loop Dissevering: A Technique for Temporally Partitioning Loops in Dynamically Reconfigurable Computing Platforms
This paper presents a technique, called loop dissevering, to temporally partitioning any type of loop presented in programming languages. The technique can be used in the presence...
João M. P. Cardoso
ISPA
2004
Springer
13 years 9 months ago
An Inspector-Executor Algorithm for Irregular Assignment Parallelization
Abstract. A loop with irregular assignment computations contains loopcarried output data dependences that can only be detected at run-time. In this paper, a load-balanced method ba...
Manuel Arenaz, Juan Touriño, Ramon Doallo