Loop vectorization, a key feature exploited to obtain high performance on Single Instruction Multiple Data (SIMD) vector architectures, is significantly hindered by irregular memo...
Byunghyun Jang, Perhaad Mistry, Dana Schaa, Rodrig...
—This paper introduces the microarchitecture and logical implementation of SMT (Simultaneous Multithreading) improvement of Godson-2 processor which is a 64-bit, four-issue, out-...
Simulated annealing based standard cell placement for VLSI designs has long been acknowledged as a compute-intensive process. All previous work in parallel simulated annealing bas...
The paper describes the development and performance of parallel algorithms for the discrete element method (DEM) software. Spatial domain decomposition strategy and message passing...
Algirdas Maknickas, Arnas Kaceniauskas, Rimantas K...
The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technolo...