Sciweavers

244 search results - page 18 / 49
» Optimizing Loop Performance for Clustered VLIW Architectures
Sort
View
PLDI
1993
ACM
15 years 3 months ago
Global Optimizations for Parallelism and Locality on Scalable Parallel Machines
Data locality is critical to achievinghigh performance on large-scale parallel machines. Non-local data accesses result in communication that can greatly impact performance. Thus ...
Jennifer-Ann M. Anderson, Monica S. Lam
ISCAS
2005
IEEE
191views Hardware» more  ISCAS 2005»
15 years 5 months ago
Behavioural modeling and simulation of a switched-current phase locked loop
Recent work has shown that the use of switched current methods can provide an effective route to implementation of analog IC functionality using a standard digital CMOS process. Fu...
Peter R. Wilson, Reuben Wilcock
ISCAPDCS
2001
15 years 1 months ago
Branch Prediction of Conditional Nested Loops through an Address Queue
-Multi-dimensional applications, such as image processing and seismic analysis, usually require the optimized performance obtained from instruction-level parallelism. The critical ...
Zhigang Jin, Nelson L. Passos, Virgil Andronache
CASES
2006
ACM
15 years 3 months ago
Efficient architectures through application clustering and architectural heterogeneity
Customizing architectures for particular applications is a promising approach to yield highly energy-efficient designs for embedded systems. This work explores the benefits of arc...
Lukasz Strozek, David Brooks
CLUSTER
2003
IEEE
15 years 5 months ago
Improving the Performance of MPI Derived Datatypes by Optimizing Memory-Access Cost
The MPI Standard supports derived datatypes, which allow users to describe noncontiguous memory layout and communicate noncontiguous data with a single communication function. Thi...
Surendra Byna, William D. Gropp, Xian-He Sun, Raje...