Many sorting algorithms have been studied in the past, but there are only a few algorithms that can effectively exploit both SIMD instructions and threadlevel parallelism. In this...
An integrated, hardware / software co-designed CISC processor is proposed and analyzed. The objectives are high performance and reduced complexity. Although the x86 ISA is targete...
Shiliang Hu, Ilhyun Kim, Mikko H. Lipasti, James E...
In this paper, we propose a multithreaded processor architecture which improves machine throughput. In our processor architecture, instructions from different threads (not a singl...
Control dominated embedded systems have to be designed for fast reaction to asynchronous external events occurring in the environment. Such systems must be able to perform signal ...
Partha S. Roop, Zoran A. Salcic, Morteza Biglari-A...
On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit ...
Marc Baboulin, Alfredo Buttari, Jack Dongarra, Jak...