In this paper, we describe a generalized approach to deriving a custom data layout in multiple memory banks for array-based computations, to facilitate high-bandwidth parallel mem...
To meet the high demand for powerful embedded processors, VLIW architectures are increasingly complex (e.g., multiple clusters), and moreover, they now run increasingly sophistica...
Memory profiling is the process of characterizing a program's memory behavior by observing and recording its response to specific input sets. Relevant aspects of the program&...
Pre-execution techniques have received much attention as an effective way of prefetching cache blocks to tolerate the everincreasing memory latency. A number of pre-execution tech...
Dongkeun Kim, Shih-Wei Liao, Perry H. Wang, Juan d...