Abstract
Contemporary High Performance Computing (HPC) applications can exhibit unacceptably high overheads when existing instrumentation–based performance analysis tools are applied. Our experience shows that for some sections of these codes, existing instrumentation–based tools can cause, on average, a fivefold increase in runtime. Our experience has been that, in a performance modelling context, these less representative runs can misdirect the modelling process.
We present an approach to recording call paths for optimised HPC application binaries, without the need for instrumentation. A a result, a new tool has been developed which complements our work on analytical– and simulation–based performance modelling. The utility of this approach, in terms of low and consistent runtime overhead, is demonstrated by a comparative evaluation against existing tools for a range of recognised HPC benchmark codes.