Achieving high performance on today’s architectures requires careful orchestration of many optimization parameters. In particular, the presence of shared-caches on multicore arch...
This work presents the first extensive study of singlenode performance optimization, tuning, and analysis of the fast multipole method (FMM) on modern multicore systems. We consid...
Aparna Chandramowlishwaran, Samuel Williams, Leoni...
Online learning algorithms have impressive convergence properties when it comes to risk minimization and convex games on very large problems. However, they are inherently sequenti...
Daniel Hsu, Nikos Karampatziakis, John Langford, A...
We present a scheduling algorithm of stream programs for multi-core architectures called team scheduling. Compared to previous multi-core stream scheduling algorithms, team schedu...
Abstract. A parallel Lattice Boltzmann Method (pLBM), which is based on hierarchical spatial decomposition, is designed to perform large-scale flow simulations. The algorithm uses ...
Liu Peng, Ken-ichi Nomura, Takehiro Oyakawa, Rajiv...