Hiding communication latency is an important optimization for parallel programs. Programmers or compilers achieve this by using non-blocking communication primitives and overlappi...
Conventional processors use a fully-associative store queue (SQ) to implement store-load forwarding. Associative search latency does not scale well to capacities and bandwidths re...
– Reducing power consumption through high-level synthesis has attracted a growing interest from researchers due to its large potential for power reduction. In this work we study ...
The amount of data handled by real-time and embedded applications is increasing. Also, applications normally have constraints with respect to freshness and timeliness of the data ...
The success of stochastic algorithms is often due to their ability to effectively amplify the performance of search heuristics. This is certainly the case with stochastic sampling ...