The thesis of this research is that the task of exposing the parallelism in a given application should be left to the algorithm designer, who has intimate knowledge of the applica...
DTA (Decoupled Threaded Architecture) is designed to exploit fine/medium grained Thread Level Parallelism (TLP) by using a distributed hardware scheduling unit and relying on exi...
GPU-based heterogeneous clusters continue to draw attention from vendors and HPC users due to their high energy efficiency and much improved single-node computational performance...
Clustered L0 buffers are an interesting alternative to reduce energy consumption in the instruction memory hierarchy of embedded VLIW processors. Currently, the synthesis of L0 cl...
Murali Jayapala, Tom Vander Aa, Francisco Barat, G...
Symmetric Multiprocessors (SMPs), combined with modern interconnection technologies are commonly used to build cost-effective compute clusters. However, contention among processor...