As applications tend to grow more complex and use more memory, the demand for cache space increases. Thus embedded processors are inclined to use larger caches. Predicting a miss i...
Commonly-used memory units enable a processor to load and store multiple registers in one instruction. We showed in 2003 how to extend gcc with a stack-location-allocation (SLA) ph...
This paper presents a connection handoff interface between the operating system and the network interface. Using this interface, the operating system can offload a subset of TCP c...
Abstract. The technique of flattening nested data parallelism combines all the independent operations in nested apply-to-all constructs and generates large amounts of potential pa...
Daniel W. Palmer, Jan Prins, Siddhartha Chatterjee...
We present Offload, a programming model for offloading parts of a C++ application to run on accelerator cores in a heterogeneous multicore system. Code to be offloaded is enclosed ...
Pete Cooper, Uwe Dolinsky, Alastair F. Donaldson, ...