Sciweavers

2609 search results - page 203 / 522
» Optimizing for parallelism and data locality
Sort
View
PARLE
1994
15 years 9 months ago
Run-Time Optimization of Sparse Matrix-Vector Multiplication on SIMD Machines
Sparse matrix-vector multiplication forms the heart of iterative linear solvers used widely in scientific computations (e.g., finite element methods). In such solvers, the matrix-v...
Louis H. Ziantz, Can C. Özturan, Boleslaw K. ...
LCTRTS
2010
Springer
16 years 28 days ago
Operation and data mapping for CGRAs with multi-bank memory
Coarse Grain Reconfigurable Architectures (CGRAs) promise high performance at high power efficiency. They fulfil this promise by keeping the hardware extremely simple, and movi...
Yongjoo Kim, Jongeun Lee, Aviral Shrivastava, Yunh...
177
Voted
KES
2005
Springer
15 years 11 months ago
Learning Method for Automatic Acquisition of Translation Knowledge
This paper presents a new learning method for automatic acquisition of translation knowledge from parallel corpora. We apply this learning method to automatic extraction of bilingu...
Hiroshi Echizen-ya, Kenji Araki, Yoshio Momouchi
EUROPAR
2011
Springer
14 years 5 months ago
Model-Driven Tile Size Selection for DOACROSS Loops on GPUs
DOALL loops are tiled to exploit DOALL parallelism and data locality on GPUs. In contrast, due to loop-carried dependences, DOACROSS loops must be skewed first in order to make ti...
Peng Di, Jingling Xue
IPPS
2008
IEEE
16 years 15 days ago
Design and optimization of a distributed, embedded speech recognition system
In this paper, we present the design and implementation of a distributed sensor network application for embedded, isolated-word, real-time speech recognition. In our system design...
Chung-Ching Shen, William Plishker, Shuvra S. Bhat...