Bruning Questions: Why Is My Process Exiting?

May 20, 2013 - by Mr. Max Bruning

Last week during my DTrace class, I talked a bit about the proc DTrace provider. This provider allows one to watch fork(2), exec(2), and exit(2) events, as well as sending and receiving of signal.h(3HEAD) events and thread-related events. A question I am sometimes asked is: "I have a process that sometimes exits. I would like to know why it is exiting."

Processes exit for 2 reasons:

  • Either they call exit(2).
  • Or they get a signal.h(3HEAD).

The call to exit(2) is either made explicitly in the code, or is in the crt1.o (C runtime library) if the main() function returns. The entry point for most processes is the _start function. To see that, here is the entry point from the elf header for a given application.

# elfdump -e /bin/vi

ELF Header
  ei_magic:   { 0x7f, E, L, F }
  ei_class:   ELFCLASS32          ei_data:       ELFDATA2LSB
  ei_osabi:   ELFOSABI_SOLARIS    ei_abiversion: EAV_SUNW_CURRENT
  e_machine:  EM_386              e_version:     EV_CURRENT
  e_type:     ET_EXEC
  e_flags:                     0
  e_entry:             0x8078a34  e_ehsize:     52  e_shstrndx:  30
  e_shoff:              0x18bc48  e_shentsize:  40  e_shnum:     31
  e_phoff:                  0x34  e_phentsize:  32  e_phnum:      6
#
# mdb /bin/vi
> 8078a34::dis
_start:                         pushl  $0x0
_start+2:                       pushl  $0x0
_start+4:                       movl   %esp,%ebp
...

And here is the part of the _start that calls exit(2):

# mdb -p 18063  <- run mdb on pid 18063
> _start::dis
_start:                         pushq  $0x0
_start+2:                       pushq  $0x0
...
_start+0x67:                    call   +0x17a   <main>
_start+0x6c:                    pushq  %rax
_start+0x6d:                    pushq  %rax
_start+0x6e:                    movq   %rax,%rdi
_start+0x71:                    call   -0xde    <PLT:exit>
_start+0x76:                    popq   %rdi
_start+0x77:                    popq   %rdi
_start+0x78:                    call   -0xd5    <PLT:_exit>
_start+0x7d:                    hlt
>

Let's try DTrace for the exit(2) system call.

# dtrace -n 'syscall::exit:entry{}'
dtrace: invalid probe specifier syscall::exit:entry{}: probe description syscall::exit:entry does not match any probes

So, the actual system call is different. Let's see if we can find it.

# dtrace -l -n 'syscall::*exit*:entry{}'
   ID   PROVIDER            MODULE                          FUNCTION NAME
 9934    syscall                                               rexit entry
10214    syscall                                            lwp_exit entry
#

Looks like rexit is what we want. Let's try:

# dtrace -qn 'syscall::rexit:entry{printf("%s: pid %d exited with exit value %d\n", execname, pid, arg0);}'
less: pid 18080 exited with exit value 0
man: pid 18079 exited with exit value 0
ksh93: pid 18082 exited with exit value 0
man: pid 18081 exited with exit value 0
...
(^C)
#

If the program is sent a signal, it typically does not call exit(2) directly. Instead, the operating system calls exit on behalf of the receiving process. This is done when the action for the signal is to terminate (unless the signal is being caught or ignored). We can use the proc provider to watch these events. Here's a script to watch for exits of a given process.

#!/usr/sbin/dtrace -qs

proc:::exit
/pid == $target && arg0 <= 3/
{
    printf("%d: exited due to %s\n", pid,
      (arg0 == 1)? "call to exit system call":
       ((arg0 == 2)? "receiving a signal": "receiving a signal and has a core dump"));
}

proc:::signal-send
/args[1]->pr_pid == $target/
{
    printf("%d: sending signo %d to pid %d\n", pid, args[2], $target);
}

syscall::rexit:entry
/pid == $target/
{
    printf("%d exited with exit value = %d\n", pid, arg0);
    ustack();
}

The script captures exit and signal events using the proc provider, and exit(2) system calls using the syscall provider.

Let's try it.

# ./exit.d -c "/bin/echo a"
a
18046 exited with exit value = 0

              libc.so.1`0xfee70dd8
18046: exited due to call to exit system call
#

Let's try with a simple program that should demonstrate all the possibilities. Here's the source for the program

/* exit in different ways */
#include <stdio.h>
#include <stdlib.h>

void
sub(void)
{
    exit(101);
}

int
main(int argc, char *argv[])
{
    char c;
    char *p = NULL;

    c = getchar();

    switch (c) {
    case 's':
        *p = 'a';  /* should SEGV */
        break;
    case 'p':
        p = (char *)main;
        *p = 'a';  /* also SEGV, protection error */
        break;
    case 'e':
        sub();
        break;
    default:
        pause();
    }
}

In one terminal window, we'll run the program:

# ./foo

In another window, we'll start the DTrace script from above.

# pgrep foo  <-- foo is the compiled C program
18110
#
# ./exit.d -p 18110

Back in the first window, typing control-c causes the following to appear in the window running the exit.d script.

# ./exit.d -p 18110
5468: sending signo 2 to pid 18110
18110: exited due to receiving a signal
#
# ptree 5468
2632  /usr/lib/ssh/sshd
  5467  /usr/lib/ssh/sshd
    5468  /usr/lib/ssh/sshd
      5471  -bash
#

So, typing a control-c causes sshd to send a SIGINT signal (signo = 2). Let's run it again, and this time we'll make it exit by sending a signal via kill(1).

# ./foo
[1] 18122
#

And in the second window, we'll run the script:

# ./exit.d -p 18122

Back in the first window:

# kill -TERM 18122
[1]+  Stopped                 ./foo
# fg
./foo
Terminated
#

In the window running the DTrace script:

5471: sending signo 15 to pid 18122
18122: exited due to receiving a signal
#
# ptree 5471
2632  /usr/lib/ssh/sshd
  5467  /usr/lib/ssh/sshd
    5468  /usr/lib/ssh/sshd
      5471  -bash
#

The kill is a bash built-in. This bash is running in the terminal where the foo process was started.

Let's have the program get a SEGV signal.

# ./foo

Start the script again:

# pgrep foo
18127
# ./exit.d -p 18127

And in the foo window, typing an 's' gives:

s
Segmentation Fault (core dumped)
#

And we see this in the window running exit.d

18127: sending signo 11 to pid 18127
18127: exited due to receiving a signal and has a core dump

So, the process "sent" the SEGV to itself.

Next week I'll blog about using DTrace to do debugging. Have fun!

Sign up Now For
Instant Cloud Access

Get Started