Thursday, July 26, 2012

A build system for the Elftoolchain project

Summary

This post describes the cross-platform build and test system being written for the Elftoolchain project.

The need for a cross-platform build and test system


In the upcoming v1.0 release of the Elftoolchain project we plan to support 6 operating system families---prominent *BSD OSes such as FreeBSD and NetBSD, Ubuntu GNU/Linux and Minix. We may additionally support an Illumos-derived operating system.

For each supported OS, we may need to test our sources on multiple released OS versions. We may need to build and test on multiple architectures (e.g., ARM, MIPS, i386 and X64/AMD64), depending on the target operating system.

Apart from mainline development, we would need to support maintenance branches of our source tree on these OS instances.

Clearly, some kind of automation is needed to help manage this combinatorial increase in support load.

Design goals


In brief, the design goals are:
  • The system should support builds of our source tree, on the target operating systems and machine architectures of our interest.
  • The system should support builds on non-native architectures (relative to the build host).
  • The system should allow a source tree that is in-development to be built and tested, prior to a check-in.
  • The system should be deployable with the minimum of software dependencies, and should be easy to configure.
  • The system should be able to run entirely on a relatively power and resource constrained system such as a laptop, i.e., without needing a beefy build box, or architecture-specific hardware.

Related Projects

Continuous integration systems such as Buildbot, Bitten and Hudson are commonly used to manage automated builds. While these tools are featureful, their large resource requirements, and the additional dependencies needed to run them (a Java/Python runtime, along with other dependencies) make these tools difficult to use in the Elftoolchain context.

The QEMU and VirtualBox programs are popular machine emulators. When running on X86/X64 hardware, these programs support the emulation of i386 and x86_64/amd64 CPUs. Additionally, QEMU can emulate non-native architectures using dynamic translation techniques. The GXemul project is a BSD-licensed machine emulator, similar to QEMU.

The Design


In the current design, the build system comprises of two major parts:

  • A simple daemon---a portable C program built using libevent, that runs inside the target OS in the machine emulator. This 'slave' component connects to a 'master/despatcher' component that runs on the build host and executes commands issued to it.
  • A 'master' component that is responsible for managing the build process at the top-level: starting up the relevant machine emulators, waiting for the OS inside to boot and for the 'slave' inside to connect back, transferring the source tree of interest into the slave, running the build/test cycle, collecting output files and output status, and shutting down the emulator cleanly.

Note that the actual build (& test) within a source tree is controlled using BSD make.

The protocol between the 'slave' and the 'master' components is spartan: it supports the execution of arbitrary shell script fragments on the slave with redirection of input and output, and supports simple data transfer between the 'slave' and the 'master'. In order to maintain responsiveness, the protocol between the 'slave' and the 'master' is asynchronous. Multiple 'slaves' could be connected to the 'master' concurrently.

The 'master' would be controlled by a set of configuration files and shell scripts.

The Implementation

The implementation (a work-in-progress) may be found in the tools/build-automation directory of the Elftoolchain project's source tree.

The implementation is being written as a literate program.

Comments welcome.

2 comments:

  1. Your post sounds like you solve the same problems that I was facing with redports.org. That is some package building cluster for FreeBSD - see http://www.redports.org/wiki/Architecture

    For the dispatcher stuff I work on a small libevent+leveldb based message queue server that implements the STOMP 1.0 protocol. That allows the dispatcher to work fully asynchronous and is a simple and easy protocol.

    ReplyDelete
    Replies
    1. redports.org looks interesting. I will take a look, thanks!

      Delete