Sciweavers

62 search results - page 3 / 13
» Adaptive memory programming for matrix bandwidth minimizatio...
Sort
View
ASAP
2004
IEEE
119views Hardware» more  ASAP 2004»
13 years 9 months ago
Automatic Synthesis of Customized Local Memories for Multicluster Application Accelerators
Distributed local memories, or scratchpads, have been shown to effectively reduce cost and power consumption of application-specific accelerators while maintaining performance. Th...
Manjunath Kudlur, Kevin Fan, Michael L. Chu, Scott...
IEEEHPCS
2010
13 years 3 months ago
Reducing memory requirements of stream programs by graph transformations
Stream languages explicitly describe fork-join parallelism and pipelines, offering a powerful programming model for many-core Multi-Processor Systems on Chip (MPSoC). In an embedd...
Pablo de Oliveira Castro, Stéphane Louise, ...
EUROPAR
2010
Springer
13 years 6 months ago
Optimized Dense Matrix Multiplication on a Many-Core Architecture
Abstract. Traditional parallel programming methodologies for improving performance assume cache-based parallel systems. However, new architectures, like the IBM Cyclops-64 (C64), b...
Elkin Garcia, Ioannis E. Venetis, Rishi Khan, Guan...
BIBE
2007
IEEE
150views Bioinformatics» more  BIBE 2007»
13 years 11 months ago
Differential Scoring for Systolic Sequence Alignment
Systolic implementations of dynamic programming solutions that utilize a similarity matrix can achieve appreciable performance with both course- and fine-grain parallelization. A ...
Antonio E. de la Serna
SIAMSC
2010
120views more  SIAMSC 2010»
13 years 3 months ago
Weighted Matrix Ordering and Parallel Banded Preconditioners for Iterative Linear System Solvers
The emergence of multicore architectures and highly scalable platforms motivates the development of novel algorithms and techniques that emphasize concurrency and are tolerant of ...
Murat Manguoglu, Mehmet Koyutürk, Ahmed H. Sa...