FPGA-based floating-point kernels must exploit algorithmic parallelism and use deeply pipelined cores to gain a performance advantage over general-purpose processors. Inability t...
The performance of MPI collective operations, such as broadcast and reduction, is heavily affected by network topologies, especially in grid environments. Many techniques to cons...
The objective of this paper is to extend, in the context of multicore architectures, the concepts of tile algorithms [Buttari et al., 2007] for Cholesky, LU, QR factorizations to t...
Gate oxide tunneling current Igate and sub-threshold current Isub dominate the leakage of designs. The latter depends on threshold voltage Vth while Igate vary with the thickness ...
Clock meshes are extremely effective at filtering clock skew from environmental and process variations. For this reason, clock meshes are used in most high performance designs. Ho...