This paper presents a novel cycle-approximate performance estimation technique for automatically generated transaction level models (TLMs) for heterogeneous multicore designs. The...
Constructing scalable high-performance applications on commodity hardware running the Unix operating system is a problem that must be addressed in several application domains. We ...
On-chip communication design includes designing software (SW) parts (operating system, device drivers, interrupt service routines, etc.) as well as hardware (HW) parts (on-chip co...
Global address space languages like UPC exhibit high performance and portability on a broad class of shared and distributed memory parallel architectures. The most scalable applic...
We present an implementation and evaluation of atomicity (also known as software transactions) for a dialect of Java. Our implementation is fundamentally different from prior work...