We use reconfigurable hardware to construct a high throughput Bayesian computing machine (BCM) capable of evaluating probabilistic networks with arbitrary DAG (directed acyclic gr...
When integrating software threads together to boost performance on a processor with instruction-level parallel processing support, it is rarely clear which code regions should be ...
Untolerated load instruction latencies often have a significant impact on overall program performance. As one means of mitigating this effect, we present an aggressive hardware-b...
Queues are commonly used in multithreaded programs for synchronization and communication. However, because software queues tend to be too expensive to support finegrained paralle...
Sanghoon Lee, Devesh Tiwari, Yan Solihin, James Tu...
Optimization of hardware resources for conditional data-flow graph behavior is particularly important when conditional behavior occurs in cyclic loops and maximization of through...