In processors with several levels of hardware resource sharing, like CMPs in which each core is an SMT, the scheduling process becomes more complex than in processors with a singl...
Petar Radojkovic, Vladimir Cakarevic, Javier Verd&...
Many techniques for increasing the amount of instruction-level parallelism (ILP) put increased pressure on the registers inside a CPU. These techniques allow for more operations t...
Jason Hiser, Steve Carr, Philip H. Sweany, Steven ...
In this paper we investigate the benefit of scheduling non-critical loads for a higher latency during software pipelining. "Noncritical" denotes those loads that have s...
Sebastian Winkel, Rakesh Krishnaiyer, Robyn Sampso...
Increased integration in the form of multiple processor cores on a single die, relatively constant die sizes, shrinking power envelopes, and emerging applications create a new cha...
Srikanth T. Srinivasan, Ravi Rajwar, Haitham Akkar...
— In this paper we give a theoretical model for determining the synchronization frequency that minimizes the parallel execution time of loops with uniform dependencies dynamicall...
Florina M. Ciorba, Ioannis Riakiotakis, Theodore A...