Sciweavers

2681 search results - page 490 / 537
» Performance results of running parallel applications on the ...
Sort
View
IEEEPACT
2009
IEEE
15 years 4 months ago
Automatic Tuning of Discrete Fourier Transforms Driven by Analytical Modeling
—Analytical models have been used to estimate optimal values for parameters such as tile sizes in the context of loop nests. However, important algorithms such as fast Fourier tr...
Basilio B. Fraguela, Yevgen Voronenko, Markus P&uu...
63
Voted
ARITH
2007
IEEE
15 years 4 months ago
A New Architecture For Multiple-Precision Floating-Point Multiply-Add Fused Unit Design
The floating-point multiply-add fused (MAF) unit sets a new trend in the processor design to speed up floatingpoint performance in scientific and multimedia applications. This ...
Libo Huang, Li Shen, Kui Dai, Zhiying Wang
77
Voted
DATE
2007
IEEE
88views Hardware» more  DATE 2007»
15 years 4 months ago
Improve CAM power efficiency using decoupled match line scheme
Content addressable memory (CAM) is widely used in many applications that require fast table lookup. Due to the parallel comparison feature and high frequency of lookup, however, ...
Yen-Jen Chang, Yuan-Hong Liao, Shanq-Jang Ruan
CLUSTER
2005
IEEE
15 years 3 months ago
Near Overhead-free Heterogeneous Thread-migration
Thread migration moves a single call-stack to another machine to improve either load balancing or locality. Current approaches for checkpointing and thread migration are either no...
Ronald Veldema, Michael Philippsen
ASPLOS
1991
ACM
15 years 1 months ago
NUMA Policies and Their Relation to Memory Architecture
Multiprocessor memory reference traces provide a wealth of information on the behavior of parallel programs. We have used this information to explore the relationship between kern...
William J. Bolosky, Michael L. Scott, Robert P. Fi...