In hardware implementation for the finite field, the use of normal basis has several advantages, especially the optimal normal basis is the most efficient to hardware implementati...
Chang Han Kim, Yongtae Kim, Sung Yeon Ji, IlWhan P...
Transactional memory (TM) provides an easy-to-use and high-performance parallel programming model for the upcoming chip-multiprocessor systems. Several researchers have proposed a...
JaeWoong Chung, Hassan Chafi, Chi Cao Minh, Austen...
—Analytical models have been used to estimate optimal values for parameters such as tile sizes in the context of loop nests. However, important algorithms such as fast Fourier tr...
Basilio B. Fraguela, Yevgen Voronenko, Markus P&uu...
We demonstrate that data reordering can substantially improve the performance of fine-grained irregular sharedmemory benchmarks, on both hardware and software shared-memory syste...
Over the last decade, Message Passing Interface (MPI) has become a very successful parallel programming environment for distributed memory architectures such as clusters. However, ...