Core File Creation and Debugging

Download

initfpu-1.0.tar.gz

Background

When doing numerics you often create programs that produce invalid results like nan or inf due to bugs in your code. It would then be very nice if your program would dump core on the first occurrence of such an invalid result instead of continue happily to run and complete ''successfully'' writing out hundreds or thousands of lines of complete nonsense.

You can achieve this -- if your operating system does not do so by default -- by setting the relevant FPU flags such that invalid instructions or results generate FPU exceptions, which will raise the FPE signal, which (if not handled otherwise) will make your program dump core and exit. This core file together with the executable itself can then be used to examine the program and find out what went wrong.

Core File Creation

Even though the following code fragments are written in C, fortran programmers can also use them - they may call C code out of their fortran programs if the calling conventions are respected. We will discuss that later.

Implementation

The following C fragments implement this for Linux and FreeBSD. While setting the relevant FPU flags is OS-dependant, the signal stuff is portable.

Setting the relevant FPU Flags

Re-installing the default Signal Handler

Unfortunately, when using fortran, the standard fortran runtimes will catch the FPE signal and you do not get the desired behaviour of core dump and exit. Then it may help to re-install the default signal handler of the FPE signal.

#include <signal.h>

  signal(SIGFPE, SIG_DFL);

Applications

To ease the use of this stuff, I put it all into a library (download). Autoconf takes care about the right fortran name mangling scheme.

For installation instructions please refer to the source package. Here I want to discuss shortly the usage of the library.

Below the demo program initfpu_demo.c is listed. It sets the FPU flags using this library and then performs an illegal computation (logarithm of a negative number).

#include <math.h>
#include <stdio.h>
#include <initfpu.h>

int main() {
  initfpu();
  double d=-1.0;
  printf("%f\n", log(d));
  return 0;
}
Normally (without setting the FPU flags) this operation executes without an error and the result nan is produced. This is exactly what we do not want.

Compile, execute and debug the program:

$ gcc -g -o initfpu_demo initfpu_demo.c -I. -L. -linitfpu -lm
$ ./initfpu_demo
Floating point exception (core dumped)
$ gdb initfpu_demo initfpu_demo.core
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
[...]
Core was generated by initfpu_demo.
Program terminated with signal 8, Arithmetic exception.
[...]
What to do with this core file: see below.

Compare this to the default behaviour if you comment out the line containing the initfpu call:

$ vi initfpu_demo.c
$ gcc -g -o initfpu_demo initfpu_demo.c -I. -L. -linitfpu -lm
$ ./initfpu_demo
nan
$ 

Using fortran, the demo program initfpu_demo_f.f looks like:

      program initfpu_demo_f
        implicit none
        real*8 d

        call initfpu
        d=-1d0
        write (6,*) dlog(d)
      end program

Configuring the shell's ulimit

Usually linux distrubitions are configured such that the generation of core files is supressed. This happens in /etc/profile or similar with a statement like ulimit -c 0. You need to comment this out or replace it by something like ulimit -c unlimited.

Only root is allowed to increase this limit. Therefore, a user cannot increase it; the system must be configured such that the supression if core file generation is never configured.

This limit is inherited by every process from its parent. Therefore, if you for example start your jobs remotely via ssh, you need to make sure that the remote sshd has the correct ulimit setting, which can usually only be configured by changing the system configuration as root.

Core File Debugging

Now that we have our core file, we want to debug it. Therefore, be sure your program has been compiled with the -g option to include debug information in the executable.

In the following we demonstrate what to do with the core file once our program died and dumped core. We load it (together with the binary that created the core file) into a debugger and examine it:

dominik@daemon ~/initfpu-1.0$ ./initfpu_demo
Floating point exception (core dumped)
dominik@daemon ~/initfpu-1.0$ gdb initfpu_demo initfpu_demo.core
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd"...
Core was generated by initfpu_demo.
Program terminated with signal 8, Arithmetic exception.
Reading symbols from /lib/libm.so.3...done.
Loaded symbols for /lib/libm.so.3
Reading symbols from /lib/libc.so.5...done.
Loaded symbols for /lib/libc.so.5
Reading symbols from /libexec/ld-elf.so.1...done.
Loaded symbols for /libexec/ld-elf.so.1
#0  0x080485d5 in main () at initfpu_demo.c:38
38        printf("%f\n", log(d));
(gdb) where
#0  0x080485d5 in main () at initfpu_demo.c:38
(gdb) list
38        printf("%f\n", log(d));
39        return 0;
40      }
There are some interesting pieces of information in this output.

Impressum

Contact the author.