It is well known that LDPC decoding is computationally demanding and one of the hardest signal operations to parallelize. Beyond data dependencies that restrict the decoding of a ...
Iterative numerical algorithms with high memory bandwidth requirements but medium-size data sets (matrix size ∼ a few 100s) are highly appropriate for FPGA acceleration. This pap...
Abid Rafique, Nachiket Kapre, George A. Constantin...
Synchronization in distributed systems is expensive because, in general, threads must stall to obtain a lock or to operate on volatile data. Transactional memory, on the other hand...
Abstract—Load balancing algorithms are an essential component of parallel computing reducing the response time of applications. Frequently, balancing algorithms have a centralize...
Juan Santana-Santana, Miguel A. Castro-Garcí...
This paper proposes a cache hierarchy that enables Web search engines to efficiently process user queries. The different caches in the hierarchy are used to store pieces of data w...