Monday, August 13, 2012

PmcTools: Motivation and Future Steps

This (long delayed) post describes the original motivation for my PmcTools whole-system profiling toolkit, and touches on some of the possible next steps for the project.


Around the year 2000, I happened to read a paper titled "Continuous Profiling: Where Have All the Cycles Gone?". The techniques described by the paper were inspiring, but the DEC Alpha™ systems they were implemented on were out of reach for a hobbyist living in India. A FreeBSD™-equivalent of those tools and techniques, running on affordable hardware, seemed a good idea.

From the outset, my goal was to create a programming toolkit for using in-CPU performance counters:

  • I wanted an API that would permit tools to fully use the features provided by the hardware.
  • I wanted tools that had low overheads, in order not to disturb the behaviour of the system being measured.
  • I wanted tools that were non-disruptive—usable without needing to restart running processes, rebooting the system or requiring recompilation, etc.
  • I wanted to analyse the "whole system" at once; i.e., to simultaneously analyze userspace applications, the top-half of the kernel, and the kernel's interrupt handlers.
  • I wanted the toolset to be SMP-ready, since SMP seemed affordable in the future.

When affordable systems using AMD Athlon™ CPUs (with publically documented in-CPU performance counters) entered the Indian market in early 2003, I built myself a machine, and started on the project.


  • 2003: Initial work, which was managed using homebrew tools tuned for dialup speeds (shell scripts running RCS layered over CVS/CVSup).
  • 2004: With the arrival of broadband access, development moved to FreeBSD's Perforce™ server.
  • 2005: The first check-in into the FreeBSD source tree in April 2005.

Current Status

At the time of writing, PmcTools is being actively maintained and extended by the FreeBSD community.

The design of the toolkit is briefly described in a tech talk presented at ACM Bangalore in 2009 (slides).

Future Steps

Platforms, simplicity and portability are likely to be the focus of future work.

PmcTools would be useful on popular hobby platforms such as the BeagleBoard. PmcTools already supports 'remote' data collection on embedded systems. However, the specific PMCs on these systems would need to be supported.
Based on the experience gained so far, both the programming APIs and the implementation of PmcTools could be simplified without losing useful functionality.
PmcTools would be a useful addition to other open-source operating systems.

In addition to the above, many innovative tools can be created: in the paper "Exploiting hardware performance counters with flow and context sensitive profiling", the authors show how to add PMC-based instrumentation to program binaries for fine-grained analyses. To be able to add such instrumentation, we need tools to parse and modify binary instruction streams—one of the motivations for the proposed libmc library, part of the Elftoolchain project.

Comments welcome.


  1. This sounds really nice!

    Could you comment on how this compares to Dtrace?

    1. The projects are orthogonal.

      PMCs measure hardware behavior: cache misses, branch mispredicts, pipeline stalls and so on. DTrace usually measures behavior at a higher-level—system calls, function entry/exit, and so on—depending on the "DTrace providers" that are present in the system.

  2. There are two important things missing from PmcTools at the moment:
    1) support for Sandy Bridge and newer Intel processors
    2) support for Instruction-Based Sampling on AMD and Precise Event-Based Sampling on Intel

    It would be very awesome if someone would take a stab at these two.

    1. For Intel processors, Intel's VTune analyser is being ported to FreeBSD, according to a recent report from the FreeBSD Foundation. The person to contact for early access is Jim Harris (jimharris@).