Partitioning an application among software running on a microprocessor and hardware co-processors in on-chip configurable logic has been shown to improve performance and energy co...
We report on the design, implementation, and evaluation of a system called Cedar that enables mobile database access with good performance over low-bandwidth networks. This is acc...
We propose the first polynomial-time code selection algorithm for minimising the worst-case execution time of a nonnested loop executed on a fully pipelined processor that uses sc...
We are attacking the memory bottleneck by building a “smart” memory controller that improves effective memory bandwidth, bus utilization, and cache efficiency by letting appl...
Binu K. Mathew, Sally A. McKee, John B. Carter, Al...
A stream processor executes an application that has been decomposed into a sequence of kernels that operate on streams of data elements. During the execution of a kernel, all stre...
Xuejun Yang, Li Wang, Jingling Xue, Yu Deng, Ying ...