We present a parallel code generation algorithm for complete applications and a new experimental methodology that tests the efficacy of our approach. The algorithm optimizes for d...
The routing architecture of the original 4.4BSD [3] kernel has been deployed successfully without major design modification for over 15 years. In the unified routing architectur...
Multicore designs have emerged as the mainstream design paradigm for the microprocessor industry. Unfortunately, providing multiple cores does not directly translate into performa...
Mojtaba Mehrara, Jeff Hao, Po-Chun Hsu, Scott A. M...
With the increasing processing power, the latency of the memory hierarchy becomes the stumbling block of many modern computer architectures. In order to speed-up the calculations, ...
Many irregular scientific computing problems can be modeled by directed acyclic task graphs (DAGs). In this paper, we present an efficient run-time system for executing general as...