This work is devoted to the numerical resolution of the 4D Vlasov equation using an adaptive mesh of phase space. We previously proposed a parallel algorithm designed for distribut...
Event tracing and monitoring of parallel applications are difficult if each processor has its own unsynchronized clock. A survey is given on several strategies to generate a glob...
Abstract. We present PerfMiner, a system for the transparent collection, storage and presentation of thread-level hardware performance data across an entire cluster. Every sub-proc...
Philip Mucci, Daniel Ahlin, Johan Danielsson, Per ...
Orca is a portable, object-based distributed shared memory system. This paper studies and evaluates the design choices made in the Orca system and compares Orca with other DSMs. T...
Henri E. Bal, Raoul Bhoedjang, Rutger F. H. Hofman...
Microprocessor vendors have provided special-purpose instructions such as psadbw and pdist to accelerate the sumof-absolute differences (SAD) similarity measurement. The usefulne...
Asadollah Shahbahrami, Ben H. H. Juurlink, Stamati...