Sciweavers

64 search results - page 11 / 13
» Optimal register assignment to loops for embedded code gener...
Sort
View
CF
2009
ACM
15 years 5 months ago
Mapping the LU decomposition on a many-core architecture: challenges and solutions
Recently, multi-core architectures with alternative memory subsystem designs have emerged. Instead of using hardwaremanaged cache hierarchies, they employ software-managed embedde...
Ioannis E. Venetis, Guang R. Gao
ASPLOS
2008
ACM
15 years 28 days ago
Communication optimizations for global multi-threaded instruction scheduling
The recent shift in the industry towards chip multiprocessor (CMP) designs has brought the need for multi-threaded applications to mainstream computing. As observed in several lim...
Guilherme Ottoni, David I. August
86
Voted
CODES
2004
IEEE
15 years 2 months ago
Operation tables for scheduling in the presence of incomplete bypassing
Register bypassing is a powerful and widely used feature in modern processors to eliminate certain data hazards. Although complete bypassing is ideal for performance, bypassing ha...
Aviral Shrivastava, Eugene Earlie, Nikil D. Dutt, ...
106
Voted
LCTRTS
2005
Springer
15 years 4 months ago
Cache aware optimization of stream programs
Effective use of the memory hierarchy is critical for achieving high performance on embedded systems. We focus on the class of streaming applications, which is increasingly preval...
Janis Sermulins, William Thies, Rodric M. Rabbah, ...
109
Voted
LCTRTS
2005
Springer
15 years 4 months ago
Complementing software pipelining with software thread integration
Software pipelining is a critical optimization for producing efficient code for VLIW/EPIC and superscalar processors in highperformance embedded applications such as digital sign...
Won So, Alexander G. Dean