Sorting Processes

A few weeks back, while teaching a SmartOS Internals class, a few of the students had an interesting suggestion for me. It was either Sam Zaydel from Racktop, or one of the Voxer guys that asked me to do a weekly tweet showing a use of mdb, DTrace, or something else in SmartOS that might be interesting. I thought about this for a bit, and I really like the idea, as one of my favorite things is instructing. The ZFS internals class also like the idea and say they have some good questions I could answer this way, so ask @MrBruning your SmartOS, DTrace, or mdb questions!

There’s a few ways to submit questions to me: Email MrBruning@joyent.com, ask on twitter at hashtag #BruningQuestions, or attend one of my classes. Every week, I’ll answer one of your questions in this new column, to be published every Friday on the Joyent blog. And for Sam, Dave, and Mike, thanks for the idea, and I’ll tweet it out just for you. Here we go with the first question! -- Mr. Bruning.

Q. How do I sort processes in the order they last ran?

Occasionally, while examining a system crash (or even the running system), I would like to see a sorted list of processes with respect to the time they last ran. I used to use scat (Solaris Crash Analysis Tool) for this, but found that the tool is no longer available without logging in to Oracle support. So, how to do the same thing with mdb.

# ls -lh *.0-rw-r--r--   1 root     root        1.7M Nov 29 01:40 unix.0-rw-r--r--   1 root     root        260M Nov 29 01:40 vmcore.0-rw-r--r--   1 root     root         59M Nov 14 08:24 vmdump.0# mdb 0Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc pcplusmp scsi_vhci ufs ip hook neti sockfs arp usba stmf_sbd stmf zfs lofs mpt idm sd crypto random cpc logindmux ptm sppp nfs ]> ::walk thread | ::print -a -d kthread_t t_hrtime !sed -e 's/t_hrtime = //' -e 's/0t//'  | sort -n -k 2,2  | cut -d " " -f 1 > /tmp/foo> ::offsetof kthread_t t_hrtimeoffsetof (kthread_t, t_hrtime) = 0x370, sizeof (...->t_hrtime) = 8> ::cat /tmp/foo | ::eval ".-370::print kthread_t t_procp | ::print proc_t p_user.u_psargs"p_user.u_psargs = [ "sched" ]p_user.u_psargs = [ "sched" ]…p_user.u_psargs = [ "/usr/sbin/nscd" ]p_user.u_psargs = [ "/usr/sbin/nscd" ]p_user.u_psargs = [ "/lib/svc/bin/svc.configd" ]p_user.u_psargs = [ "sched" ]p_user.u_psargs = [ "/usr/sbin/nscd" ]p_user.u_psargs = [ "zpool-testpool" ]>

I’m running mdb on a kernel crash dump (unix.0 and vmcore.0). The same set of commands would also work on a running system, though may not be entirely accurate (things can change while running the three mdb command lines). Note that some processes show up more than once in the output. This is either because there are multiple instances of the process, or the process is multithreaded.

The following goes through the mdb lines and explains what they do. The line:

> ::walk thread | ::print -a -d kthread_t t_hrtime !sed -e 's/t_hrtime = //' -e 's/0t//'  | sort -n -k 2,2  | cut -d " " -f 1 > /tmp/foo

does the following:

  1. Walk through all of the kernel threads on the system (threads run, processes are just containers for an address space and for one or more threads).

  2. For each kthread_t, print the t_hrtime member. This field is the (high resolution) time that the thread last ran. The “-a” option to print will print the address of the field, and the “-d” will print the value in decimal. The output of ::walk thread | ::print -a -d kthread_t t_hrtime looks like:

    fffffffffbc2fe30 t_hrtime = 0t361484793159ffffff0003e05fb0 t_hrtime = 0ffffff0003e0bfb0 t_hrtime = 0t1970215059950ffffff0003e11fb0 t_hrtime = 0ffffff0003e17fb0 t_hrtime = 0t4834878495122ffffff0003e1dfb0 t_hrtime = 0…
  3. The next step is to pipe the output of ::walk thread | ::print -a -d kthread_t t_hrtime to sed -e ‘s/t_hrtime = //’ -e ‘s/0t//’ which strips t_hrtime and 0t from the output. (The “0t” is mdb’s way of telling you that it is a decimal number. By default, mdb shows hex with ::print).

  4. The output from sed is then piped to sort -n -k 2,2 | cut -d " " -f 1 > /tmp/foo, which sorts the output by the t_hrtime value, and then strips off the time, leaving only the addresses. This output is put into /tmp/foo. This will sort from earliest to latest. Adding “-r” to the sort command will sort from latest to earliest. A quick look at /tmp/foo shows:

    > ::cat /tmp/fooffffff0003e05fb0ffffff0003e11fb0ffffff0003e1dfb0ffffff0003e23fb0ffffff0003e29fb0ffffff0003e2ffb0…

The next step is to get the offset within kthread_t of the t_hrtime member. We need this because we want to get other information from the kthread_t structure. This is done via:

> ::offsetof kthread_t t_hrtimeoffsetof (kthread_t, t_hrtime) = 0x370, sizeof (...->t_hrtime) = 8>

The final line:

> ::cat /tmp/foo | ::eval ".-370::print kthread_t t_procp | ::print proc_t p_user.u_psargs"

does the following:

  1. Pipes the contents of the /tmp/foo file to the ::eval dcmd.
  2. The ::eval “.-370:print kthread_t t_procp | ::print proc_t p_user.u_psargs" line takes each line of input from the ::cat, subtracts 0x370 from the address to get the start of the kthread_t, and prints the command line for the process in which the thread is running. If you wanted to print the stacks instead of the command line, you would use ::cat /tmp/foo | ::eval “.-370::threadlist -v”, or, instead of ::threadlist -v, you could use ::findstack.

What if you wanted to see the order of threads, but only for a specific process? You could use:

> ::pgrep inetd | ::walk thread | ::print -a -d kthread_t t_hrtime !sed -e 's/t_hrtime = //' -e 's/0t//'  | sort  -n -k 2,2  | cut -d " " -f 1 > /tmp/foo> ::cat /tmp/foo | ::eval ".-370::threadlist -v"            ADDR             PROC              LWP CLS PRI            WCHANffffff014aa29860 ffffff01498b7078 ffffff014a597cc0   2  29                0  PC: _resume_from_idle+0xf1    CMD: /usr/lib/inet/inetd start  stack pointer for thread ffffff014aa29860: ffffff00045c6d50  [ ffffff00045c6d50 _resume_from_idle+0xf1() ]    swtch+0x141()    shuttle_swtch+0x203()    door_return+0x214()    doorfs32+0x180()    _sys_sysenter_post_swapgs+0x149()            ADDR             PROC              LWP CLS PRI            WCHANffffff01496c14e0 ffffff01498b7078 ffffff014a5983c0   2  29                0  PC: _resume_from_idle+0xf1    CMD: /usr/lib/inet/inetd start  stack pointer for thread ffffff01496c14e0: ffffff00045c0d20  [ ffffff00045c0d20 _resume_from_idle+0xf1() ]    swtch_to+0xb6()    shuttle_resume+0x2af()    door_return+0x3e0()    doorfs32+0x180()    _sys_sysenter_post_swapgs+0x149()…

What if you wanted to know how long it has been since threads last ran? Or you wanted to look at user level stacks? That will wait for a future blog. To ask this question, or questions of your own, submit to MrBruning@joyent.com or on Twitter at #BruningQuestions



Post written by Joyent