Abstract. Profiling is often the method of choice for performance analysis of parallel applications due to its low overhead and easily comprehensible results. However, a disadvanta...
Abstract. In an embedded multiprocessor system the minimum throughput and maximum latency of real-time applications are usually derived given the worst-case execution time of the s...
Arno Moonen, Marco Bekooij, Rene van den Berg, Jef...
In this paper we present our work toward FPGA acceleration of phylogenetic reconstruction, a type of analysis that is commonly performed in the fields of systematic biology and co...
Many sorting algorithms have been studied in the past, but there are only a few algorithms that can effectively exploit both SIMD instructions and threadlevel parallelism. In this...
In this paper, we describe the design of a C library named PTPol implementing arithmetic operations for univariate polynomials and report on practical experiments showing the rele...