Modern processors use branch target buffers (BTBs) to predict the target address of branches such that they can fetch ahead in the instruction stream increasing concurrency and pe...
Iterative stencil loops (ISLs) are used in many applications and tiling is a well-known technique to localize their computation. When ISLs are tiled across a parallel architecture...
An ensemble is a set of learned models that make decisions collectively. Although an ensemble is usually more accurate than a single learner, existing ensemble methods often tend ...
Hard-to-predict branches depending on longlatency cache-misses have been recognized as a major performance obstacle for modern microprocessors. With the widening speed gap between...
Hongliang Gao, Yi Ma, Martin Dimitrov, Huiyang Zho...
Titanium is a language and system for high-performance parallel scientific computing. Titanium uses Java as its base, thereby leveraging the advantages of that language and allow...
Katherine A. Yelick, Luigi Semenzato, Geoff Pike, ...