Performance evaluation using only a subset of programs from a benchmark suite is commonplace in computer architecture research. This is especially true during early design space exploration when a variety of enhancements need to be evaluated to reach a good microprocessor architecture in a limited amount of time. When such a subset of benchmark programs is used for performance evaluation of architectural enhancements, it is essential that the subset is well distributed within the target workload space rather than clustered in specific areas. Past efforts for identifying subsets have primarily relied on using microarchitecture-dependent metrics of program performance, such as cycles per instruction and cache miss-rate. The shortcoming of this technique is that the results could be biased by the idiosyncrasies of the chosen configurations. We believe that a technique based on measuring the inherent characteristics of a program will make the results applicable to any microarchitecture. T...