Bulk memory copies incur large overheads such as CPU stalling (i.e., no overlap of computation with memory copy operation), small register-size data movement, cache pollution, etc...
Karthikeyan Vaidyanathan, Lei Chai, Wei Huang, Dha...
With the current trend toward multicore architectures, improved execution performance can no longer be obtained via traditional single-thread instruction level parallelism (ILP), ...
We present a pipelining, dynamically usercontrollable reorder operator, for use in dataintensive applications. Allowing the user to reorder the data delivery on the fly increases...
Vijayshankar Raman, Bhaskaran Raman, Joseph M. Hel...
Branch-and-bound algorithms are general methods applicable to various combinatorial optimization problems and parallelization is one of the most hopeful methods to improve these a...
We present Darwin, an enabling technology for mobile phone sensing that combines collaborative sensing and classification techniques to reason about human behavior and context on ...