When I recently upgraded my FreeBSD6/amd64 ``-STABLE'' machine, the CVSup binary on the system stopped working. CVSup would dump core consistently, shortly after connecting to the remote server. This rather unwelcome development took away my ability to keep my CVS trees upto-date; fixing the bug became top priority. The bug also turned out to be an interesting one.
The bug
Running CVSup after the upgrade to 6.3-PRERELEASE would result in a core dump shortly after connection establishment.
Program received signal SIGBUS, Bus error.
0x0000000800682d4f in fcntl () from /lib/libc.so.6
(gdb)
(gdb) disassemble fcntl
... snip ...
0x0000000800682d3f <fcntl+79>: movaps %xmm4,0xffffffffffffffc1(%rax)
0x0000000800682d43 <fcntl+83>: movaps %xmm3,0xffffffffffffffb1(%rax)
0x0000000800682d47 <fcntl+87>: movaps %xmm2,0xffffffffffffffa1(%rax)
0x0000000800682d4b <fcntl+91>: movaps %xmm1,0xffffffffffffff91(%rax)
0x0000000800682d4f <fcntl+95>: movaps %xmm0,0xff
ffffffffffff81(%rax)
0x0000000800682d53 <fcntl+99>: lea 0x110(%rsp),%rax
0x0000000800682d5b <fcntl+107>: movl $0x10,0x20(%rsp)
0x0000000800682d63 <fcntl+115>: movl $0x30,0x24(%rsp)
0x0000000800682d6b <fcntl+123>: mov %rax,0x28(%rsp)
... snip ...
The faulting instruction was trying to save SSE registers to memory; and this was odd since there was no reason for this particular code path to be using SSE registers in the first place.
Rebuilding Modula-3 and CVSup from source did not fix the core dump, though the builds of these tools themselves completed without error. A search through the PR database revealed that other FreeBSD users had also been tripped by the bug: PR bin/124353.
A peek at the solution
Modula-3's runtime needed to be patched in the following way to fix this fault.
- First, in
$M3SRC/libs/m3core/src/unix/freebsd-4.amd64/Unix.i3, we
declare the Modula-3 function Unix.fcntl() as being
implemented externally by C function ufcntl().
... snip ... <*EXTERNAL "ufcntl"*> PROCEDURE fcntl (fd, request: int; arg: long): int; ... snip ...
- Matching this declaration, an implementation of ufcntl()
was provided in
$M3SRC/libs/m3core/src/runtime/FBSD_AMD64/RTHeapDepC.c:
... #include <fcntl.h> ... int ufcntl(int fd, int cmd, long arg) { return (fcntl(fd, cmd, arg)); }
On the surface, this "fix" does not seem to be doing anything. The ufcntl() entry point takes 3 arguments but it passes these down to fcntl() unchanged, and in the same order.
Yet, despite the apparent ``no op''-like nature of the change, the core dumps were gone.
Why this works
To understand why this fix works, we have to delve into the ABI; into the C calling conventions used for AMD64 code.
For normal function calls, the AMD64 calling convention passes upto 6 integer arguments in registers. Thus register %rdi would hold the first argument (fd in our case), register %rsi the second, cmd, register %rdx the third and so on. However, the C prototype for fcntl() is: int fcntl(fd, cmd, ...);, i.e., fcntl is a varargs function. Varargs functions use a different calling convention on the AMD64: register %rax is a ``hidden'' input parameter for these functions.
So, prior to the fix, the Modula-3 runtime was invoking fcntl() directly, but with registers set up for a non-varargs function call.
Now, as it turns out, in FreeBSD 6.2 and earlier, fcntl() in libc was not a C language function; rather it was implemented as an assembly language stub that invoked the SYS_fcntl system call. On the AMD64, FreeBSD's argument passing convention for system calls is close enough to the non-varargs C calling convention that the processor's registers happened to be correctly setup for a direct system call.
When fcntl() in libc was changed in FreeBSD 6-STABLE on 24 Apr 2008 to be a C function instead of a system call, things broke.
Though not obvious from just looking at the C code, the no-op like fix above works by using the C compiler to translate between the two calling conventions.
What's worrying
The relevant change to libc was in CVS/SVN HEAD for about 20 days before it was merged to -stable. CVSup is also a critical tool for the FreeBSD project. This bug was however only detected in -stable, and not in -current.