Abstract. In this paper, we propose a universal network, called recursive dual-net (RDN). It can be used as a candidate of effective interconnection networks for massively parallel...
We present the algorithm to multiply univariate polynomials with integer coefficients efficiently using the Number Theoretic transform (NTT) on Graphics Processing Units (GPU). The...
Abstract. With more cores integrated into one single chip, the overall power consumption from the multiple concurrent running programs increases dramatically in a CMP processor whi...
SIMD extension is one of the most common and effective technique to exploit data-level parallelism in today’s processor designs. However, the performance of SIMD architectures i...
Abstract. In many-core CMP architectures, the cache coherence protocol is a key component since it can add requirements of area and power consumption to the final design and, there...