When I recently upgraded my FreeBSD6/amd64 ``-STABLE'' machine, the
CVSup binary on the system stopped working. CVSup would dump core
consistently, shortly after connecting to the remote server. This rather unwelcome development took away my ability to keep my
CVS trees upto-date; fixing the bug became top priority. The bug
also turned out to be an interesting one.
The bug
Running CVSup after the upgrade to 6.3-PRERELEASE would
result in a core dump shortly after connection establishment.
Program received signal SIGBUS, Bus error.
0x0000000800682d4f in fcntl () from /lib/libc.so.6
(gdb)
(gdb) disassemble fcntl
... snip ...
0x0000000800682d3f <fcntl+79>: movaps %xmm4,0xffffffffffffffc1(%rax)
0x0000000800682d43 <fcntl+83>: movaps %xmm3,0xffffffffffffffb1(%rax)
0x0000000800682d47 <fcntl+87>: movaps %xmm2,0xffffffffffffffa1(%rax)
0x0000000800682d4b <fcntl+91>: movaps %xmm1,0xffffffffffffff91(%rax)
0x0000000800682d4f <fcntl+95>: movaps %xmm0,0xff
ffffffffffff81(%rax)
0x0000000800682d53 <fcntl+99>: lea 0x110(%rsp),%rax
0x0000000800682d5b <fcntl+107>: movl $0x10,0x20(%rsp)
0x0000000800682d63 <fcntl+115>: movl $0x30,0x24(%rsp)
0x0000000800682d6b <fcntl+123>: mov %rax,0x28(%rsp)
... snip ...
The faulting instruction was trying to save SSE registers to
memory; and this was odd since there was no reason for this
particular code path to be using SSE registers in the first
place.
Rebuilding Modula-3 and CVSup from source did not fix the core
dump, though the builds of these tools themselves completed without
error. A search through the PR database revealed that other FreeBSD
users had also been tripped by the bug: PR bin/124353.
A peek at the solution
Modula-3's runtime needed to be patched in the following way to fix
this fault.
- First, in
$M3SRC/libs/m3core/src/unix/freebsd-4.amd64/Unix.i3, we
declare the Modula-3 function Unix.fcntl() as being
implemented externally by C function ufcntl().
... snip ...
<*EXTERNAL "ufcntl"*> PROCEDURE fcntl (fd, request: int; arg: long): int;
... snip ...
- Matching this declaration, an implementation of ufcntl()
was provided in
$M3SRC/libs/m3core/src/runtime/FBSD_AMD64/RTHeapDepC.c:
...
#include <fcntl.h>
...
int
ufcntl(int fd, int cmd, long arg)
{
return (fcntl(fd, cmd, arg));
}
On the surface, this "fix" does not seem to be doing anything. The
ufcntl() entry point takes 3 arguments but it passes these
down to fcntl() unchanged, and in the same order.
Yet, despite the apparent ``no op''-like nature of the change, the core
dumps were gone.
Why this works
To understand why this fix works, we have to delve into the ABI;
into the C calling conventions used for AMD64 code.
For normal function calls, the AMD64 calling convention passes upto
6 integer arguments in registers. Thus register %rdi would
hold the first argument (fd in our case), register
%rsi the second, cmd, register %rdx the
third and so on. However, the C prototype for fcntl() is:
int fcntl(fd, cmd, ...);, i.e., fcntl is a varargs
function. Varargs functions use a different calling convention on
the AMD64: register %rax is a ``hidden'' input parameter
for these functions.
So, prior to the fix, the Modula-3 runtime was invoking
fcntl() directly, but with registers set up for a
non-varargs function call.
Now, as it turns out, in FreeBSD 6.2 and earlier, fcntl()
in libc was not a C language function; rather it
was implemented as an assembly language stub that invoked the
SYS_fcntl system call. On the AMD64, FreeBSD's argument
passing convention for system calls is close enough to the
non-varargs C calling convention that the processor's registers
happened to be correctly setup for a direct system call.
When fcntl() in libc was changed in FreeBSD
6-STABLE on 24 Apr 2008 to be a C function instead of a system call,
things broke.
Though not obvious from just looking at the C code, the no-op like
fix above works by using the C compiler to translate between the two
calling conventions.
What's worrying
The relevant change to libc was in CVS/SVN HEAD for about
20 days before it was merged to -stable. CVSup is also a critical tool for
the FreeBSD project. This bug was however only detected in -stable, and not
in -current.