Bruning Questions: Profiling Node.js Processes on SmartOS

November 25, 2013 - by Mr. Max Bruning

Introduction

In this blog post, I will show ways to gather information about Node.js processes running on SmartOS (how to do this on other operating systems is left as an exercise for the reader). Specifically, we'll examine cpu utilization, memory usage, file system I/O, and network I/O, using DTrace plus a few other tools available on SmartOS instances running in the Joyent Public Cloud.

It should be mentioned that as I gathered information to write this post, I found that cloud analytics already does most everything you might want. Also, Brendan Gregg of Joyent has written extensively about performance, both in blog posts and in books. For instance, take a look at Brendan's Blog, specifically the section on Node.js latency. The DTrace scripts and other utilities I show below should work on any SmartOS system, (and most should work on any illumos or Solaris-10/11 based systems).

Most (all?) of the DTrace scripts already exist in one form or another at Brendan Gregg's DTraceToolkit. The DTraceToolkit contains a set of commands that use DTrace to measure CPU utilization, file system I/O, network I/O, and more.

Most of the following will work on any system that supports DTrace (with a few, if any, minor modifications). And everything will work with other processes besides node.

The D scripts and other tools used in this post could be used as a basis for metering or monitoring your node applications.

CPU Utilization

The simplest command to measure CPU utilization for a given Node.js process is prstat(1M).

(Note that top is available and can be installed using pkgin(1)).

Here is example output:

# pgrep node
94508

# prstat -c -p 94508
Please wait...
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP       
 94508 root       36M   20M cpu1     1    0   0:26:56  50% node/2
Total: 1 processes, 2 lwps, load averages: 1.47, 2.28, 2.22
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP       
 94508 root       36M   20M cpu0     1    0   0:27:01  50% node/2
Total: 1 processes, 2 lwps, load averages: 1.43, 2.26, 2.21
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP       
 94508 root       36M   20M cpu1     1    0   0:27:06  50% node/2
Total: 1 processes, 2 lwps, load averages: 1.40, 2.24, 2.21
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP       
 94508 root       36M   20M cpu1     1    0   0:27:11  50% node/2
Total: 1 processes, 2 lwps, load averages: 1.37, 2.22, 2.20
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP       
 94508 root       36M   20M run      1    0   0:27:16  50% node/2
Total: 1 processes, 2 lwps, load averages: 1.34, 2.20, 2.19
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP       
 94508 root       36M   20M cpu0     1    0   0:27:21  50% node/2
Total: 1 processes, 2 lwps, load averages: 1.31, 2.18, 2.19
(^C)
#

The above output shows a node process that uses about 50% of the CPU available on the system. The system being used has 2 cpus, so the amount of CPU that is being used is equivalent to 1 processor's worth. In general, prstat shows recent CPU usage by the process.

What if we wanted a closer look?

# prstat -cmL -p 94508
Please wait...
   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID 
 94508 root     100 0.0 0.0 0.0 0.0 0.0 0.0 0.3   0 584  14   0 node/1
 94508 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.2 476   1 476   0 node/2
Total: 1 processes, 2 lwps, load averages: 0.93, 1.25, 1.75
   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID 
 94508 root      99 0.0 0.0 0.0 0.0 0.0 0.0 0.9   0 629   0   0 node/1
 94508 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.3 478   1 478   0 node/2
Total: 1 processes, 2 lwps, load averages: 0.94, 1.25, 1.75
   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID 
 94508 root     100 0.0 0.0 0.0 0.0 0.0 0.0 0.1   0 297   0   0 node/1
 94508 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.1 474   0 474   0 node/2
Total: 1 processes, 2 lwps, load averages: 0.95, 1.25, 1.75

The "-L" option shows a row for each thread in the process, and "-m" shows microstate information for each thread. From this output, 1 thread of node is spending all available time running user level code, the other thread is sleeping (SLP = 100) all of the time. The total CPU usage is shown as a percentage (the sum of the USR, SYS, RTP, TFL, and DFL fields. See the man page).

While we're at it, let's look at the CPUs on this system.

# psrinfo -vp
The physical processor has 1 virtual processor (0)
  x86 (GenuineIntel 306A9 family 6 model 58 step 9 clock 2392 MHz)
   Intel(r) Core(tm) i7-3635QM CPU @ 2.40GHz
The physical processor has 1 virtual processor (1)
  x86 (GenuineIntel 306A9 family 6 model 58 step 9 clock 2392 MHz)
   Intel(r) Core(tm) i7-3635QM CPU @ 2.40GHz
#

Let's use DTrace to get the exact amount of CPU that the process is using. The following script (cpu_usage.d), will print out once per second the amount of time spent by a given process (pid passed in as an argument, which is $target in the script). It shows the amount of time spent executing on each processor, as well as the total time. Note that for threads that are switching on and off cpu very often, it may be better to use the DTrace profile provider rather than the sched provider as it may have less overhead, though it won't be as accurate.

#!/usr/sbin/dtrace -qs

sched:::on-cpu
/pid == $target/
{
    self->ts = timestamp;
}

sched:::off-cpu
/self->ts/
{
    @[cpu] = sum(timestamp - self->ts);
    @total = sum(timestamp - self->ts);
    self->ts = 0;
}

tick-1sec
{
    printf("%12s %12s\n", "CPU ID", "NSECS");
    printa(@); 
    printa("Total time in nsecs = %@d\n", @total);
    trunc(@);
    trunc(@total);
}

Running the above for a given Node.js process:

# ./cpu_usage.d -p `pgrep node`
      CPU ID        NSECS
        1          1261605
        0        998760443
Total time in nsecs = 1000022048

      CPU ID        NSECS
        1          6171524
        0        986705066
Total time in nsecs = 992876590

      CPU ID        NSECS
        0          3576080
        1        983075132
Total time in nsecs = 986651212

      CPU ID        NSECS
        1          6374274
        0        997694246
Total time in nsecs = 1004068520
(^c)
#

This is for the same compute-bound node program shown earlier. And, of course, this could be for any process, not just node. Note that not all of the work is done only on 1 cpu. We could break this down further by adding the thread id (tid) as a key. Here is the cpu_usage_thread.d script.

#!/usr/sbin/dtrace -qs

sched:::on-cpu
/pid == $target/
{
    self->ts = timestamp;
}

sched:::off-cpu
/self->ts/
{
    @[cpu, tid] = sum(timestamp - self->ts);
    @total = sum(timestamp - self->ts);
    self->ts = 0;
}

tick-1sec
{
    printf("%10s %8s %10s", "CPU ID", "TID", "NSECS");
    printa(@); 
    printa("Total time in nsecs = %@d\n\n", @total);
    trunc(@);
    trunc(@total);
}

And the output:

# ./cpu_usage_thread.d -p `pgrep node`
    CPU ID      TID      NSECS
        1        2           962332
        1        1          9746023
        0        1        970480428
Total time in nsecs = 981188783

    CPU ID      TID      NSECS
        1        2           959221
        1        1          4316602
        0        1       1000276644
Total time in nsecs = 1005552467
(^c)
#

Expressing CPU usage as a percentage can have multiple meanings. It could be the percentage of the total amount of CPU available on the system. On a virtualized SmartOS instance (i.e., zone), it could mean a percentage of the CPU cap for that zone. For the first meaning, on a system with 2 CPUs, a value of 1005552467 nsecs in 1 second would be just over 50% (1005552467/2000000000, where 2000000000 nsecs is 2 seconds worth of CPU time). To find the cap for a zone, use prctl(1).

# prctl -n zone.cpu-cap -i zone $(zonename)
zone: 63: d6e7ac12-a60c-4a1d-9aea-d8d9983ee6c8
NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
zone.cpu-cap
        usage               2    
        privileged        600       -   deny                                 -
        system          4.29G     inf   deny
#

So, the CPU cap on this instance is 600, or 6 CPUs. The system considers the zone to be "bursting" if the CPU usage is above cpu-baseline:

# prctl -n zone.cpu-baseline -i zone $(zonename)
zone: 63: d6e7ac12-a60c-4a1d-9aea-d8d9983ee6c8
NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
zone.cpu-baseline
        usage              40    
        privileged         40       -   none                                 -
        system          4.29G     max   none                                 -
#

So, when processes in the zone use more than 40% of 1 CPU in total, the zone is "bursting". Let's run the compute bound node process multiple times on a system with the same cap and baseline, but with more (16) CPUs.

# psrinfo
0    on-line   since 02/12/2013 01:05:02
1    on-line   since 02/12/2013 01:05:09
2    on-line   since 02/12/2013 01:05:09
3    on-line   since 02/12/2013 01:05:09
4    on-line   since 02/12/2013 01:05:09
5    on-line   since 02/12/2013 01:05:09
6    on-line   since 02/12/2013 01:05:09
7    on-line   since 02/12/2013 01:05:09
8    on-line   since 02/12/2013 01:05:09
9    on-line   since 02/12/2013 01:05:09
10   on-line   since 02/12/2013 01:05:09
11   on-line   since 02/12/2013 01:05:09
12   on-line   since 02/12/2013 01:05:09
13   on-line   since 02/12/2013 01:05:09
14   on-line   since 02/12/2013 01:05:10
15   on-line   since 02/12/2013 01:05:10
#

Running 5 instances of the process shows the following in prstat(1M).

# prstat -mcLv -p `pgrep -d "," node`
Please wait...
   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID 
 24382 root     100 0.1 0.0 0.0 0.0 0.0 0.0 0.1   1 227  40   0 node/1
 24386 root     100 0.1 0.0 0.0 0.0 0.0 0.0 0.1   1 217  40   0 node/1
 24389 root     100 0.1 0.0 0.0 0.0 0.0 0.0 0.1   2 202  41   0 node/1
 24321 root     100 0.1 0.0 0.0 0.0 0.0 0.0 0.1   3 219  39   0 node/1
 24245 root     100 0.1 0.0 0.0 0.1 0.0 0.0 0.1   5 225  39   0 node/1
 24321 root     0.0 0.1 0.0 0.0 0.0 0.0 100 0.1 499   1 499   0 node/2
 24389 root     0.0 0.1 0.0 0.0 0.0 0.0 100 0.1 500   1 500   0 node/2
 24382 root     0.0 0.1 0.0 0.0 0.0 0.0 100 0.0 500   0 500   0 node/2
 24386 root     0.0 0.1 0.0 0.0 0.0 0.0 100 0.0 499   1 499   0 node/2
 24245 root     0.0 0.1 0.0 0.0 0.0 0.0 100 0.0 500   0 500   0 node/2
Total: 5 processes, 10 lwps, load averages: 2.33, 0.59, 0.20
   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID 
 24386 root     100 0.0 0.0 0.0 0.0 0.0 0.0 0.0   0 142   0   0 node/1
 24389 root     100 0.0 0.0 0.0 0.0 0.0 0.0 0.0   0 102   0   0 node/1
 24382 root     100 0.0 0.0 0.0 0.0 0.0 0.0 0.1   0 218   0   0 node/1
 24321 root     100 0.0 0.0 0.0 0.0 0.0 0.0 0.1   3 112   0   0 node/1
 24245 root     100 0.0 0.0 0.0 0.0 0.0 0.0 0.1   4 220   0   0 node/1
 24321 root     0.0 0.1 0.0 0.0 0.0 0.0 100 0.1 500   1 500   0 node/2
 24245 root     0.0 0.1 0.0 0.0 0.0 0.0 100 0.0 501   2 500   0 node/2
 24386 root     0.0 0.1 0.0 0.0 0.0 0.0 100 0.0 500   1 500   0 node/2
 24382 root     0.0 0.1 0.0 0.0 0.0 0.0 100 0.0 501   1 500   0 node/2
 24389 root     0.0 0.1 0.0 0.0 0.0 0.0 100 0.1 500   0 500   0 node/2
Total: 5 processes, 10 lwps, load averages: 2.54, 0.66, 0.23
...
(^c)
#

From this output, the 5 processes are spending 100% each in User mode (executing code in the application). Let's see what the current CPU usage is with respect to the CPU cap.

# prctl -n zone.cpu-cap -i zone $(zonename)
zone: 63: d6e7ac12-a60c-4a1d-9aea-d8d9983ee6c8
NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
zone.cpu-cap
        usage             356    
        privileged        600       -   deny                                 -
        system          4.29G     inf   deny                                 -
#

So, even though there are 5 processes running non-stop, the system only counts 356 towards the cap of 600, (i.e., roughly 3.5 CPUs worth). Note that you may see the numbers being more erratic. One would expect the current usage to be slightly more than 500 (each of the node processes has 2 threads, one thread is doing almost all of the work, but the other thread is also doing some work (500 system calls). Using DTrace on the node processes gives exact times.

# ./cpu_usage_name.d node
      CPU ID        NSECS
        6         17358262
       14        120929836
        8        138903983
       13        170075625
       11        209486653
        2        214565944
        4        227744020
       10        293542515
       12        338535596
        5        365796020
        1        377194770
       15        444786533
        7        470248592
        0        496330717
        3        656849191
        9        838369383
Total time in nsecs = 5380717640

      CPU ID        NSECS
        8         27498575
        1         74287799
        4         77530043
       14         77689710
       10        180125042
       15        207094470
       11        284097209
       12        290378415
        0        294754013
        2        301221145
        9        316360663
        6        337518185
        7        367458998
        5        395776929
       13        478533303
        3        497458312
Total time in nsecs = 4207782811

(^c)
#

So sometimes the processes are over the cap, and sometimes under the cap. The cpu_usage_name.d script is a slight modification from the scripts shown earlier.

#!/usr/sbin/dtrace -qs

sched:::on-cpu
/execname == $$1/
{
    self->ts = timestamp;
}

sched:::off-cpu
/self->ts/
{
    @[cpu] = sum(timestamp - self->ts);
    @total = sum(timestamp - self->ts);
    self->ts = 0;
}

tick-1sec
{
    printf("%12s %12s", "CPU ID", "NSECS");
    printa(@); 
    printa("Total time in nsecs = %@d\n\n", @total);
    trunc(@);
    trunc(@total);
}

The CPU usage calculation used to enforce the CPU cap is performed once every 10msecs (clock tick). According to a comment (and the code itself), if a thread uses multiple clock ticks without going off CPU, it will only be charged 1 tick towards its cap. The relevant code is in cpucaps.c. Note specifically the comment starting at line 39, and the caps_charge_adjust() function in that file. For compute-bound threads, the time slice for which they are scheduled will often be longer than 1 clock tick. We could check this further with DTrace, but will need to do that in the global zone. Using DTrace to show that the discrepancy can be explained by this is left as an exercise.

In this example, we only examined CPU-bound node processes. We could spend more time here using DTrace and flamegraphs. Flamegraphs are a visualization tool that allow you to see where your app is spending its time.

File System Utilization

To "see" file system utilization system wide within a SmartOS virtualized instance, you can use vfsstat(1M). The following gives output once a second. The first line should be ignored; it is an average since the system was booted. See the man page for a description of the columns.

# vfsstat 1
  r/s   w/s  kr/s  kw/s ractv wactv read_t writ_t  %r  %w   d/s  del_t zone
 10.2   0.0   7.8   0.1   0.0   0.0    0.0    0.0   0   0   0.0   32.0 d6e7ac12 (63)
  0.0   0.0   0.0   0.0   0.0   0.0    0.0    0.0   0   0   0.0    0.0 d6e7ac12 (63)
  0.0   0.0   0.0   0.0   0.0   0.0    0.0    0.0   0   0   0.0    0.0 d6e7ac12 (63)
  0.0   0.0   0.0   0.0   0.0   0.0    0.0    0.0   0   0   0.0    0.0 d6e7ac12 (63)
(^c)
#

To examine file system I/O by node processes, we'll watch system calls that are accessing files. We'll start with a simple example that simply reports the number of file system I/O calls per second made by a node app.

Here's the script:

#!/usr/sbin/dtrace -qs

syscall::read:entry,syscall::readv:entry,syscall::pread:entry
/pid == $target && fds[arg0].fi_fs == "zfs"/
{
     @r[pid] = count();
}

syscall::write:entry,syscall::writev:entry,syscall::pwrite:entry
/pid == $target && fds[arg0].fi_fs == "zfs"/
{
     @w[pid] = count();
}

tick-1s
{
     printf("\nIOPS\n");
     printa("Read operations: pid = %d, #io = %@d\n", @r);
     printa("Write operations: pid = %d, #io = %@d\n", @w);
     trunc(@r);
     trunc(@w);
}

Running the script on some process:

# ./node_fsrw.d -p 7767

IOPS

IOPS
Write operations: pid = 7767, #io = 7

IOPS

IOPS
Write operations: pid = 7767, #io = 11
...
(^c)
#

That tells me about IOPS – what about throughput, i.e., the number of bytes read/written?

#!/usr/sbin/dtrace -qs

syscall::read:entry,syscall::readv:entry, syscall::pread:entry
/pid == $target && fds[arg0].fi_fs == "zfs"/
{
    self->inr = 1;
    self->fd = arg0;
}

syscall::read:return,syscall::readv:return, syscall::pread:return
/self->inr && pid == $target && fds[self->fd].fi_fs == "zfs" && arg1 >= 0/
{
    @r = sum(arg1);
    self->inr = 0;
}

syscall::write:entry,syscall::writev:entry, syscall::pwrite:entry
/pid == $target && fds[arg0].fi_fs == "zfs"/
{
    self->inw = 1;
    self->fd = arg0;
}

syscall::write:return,syscall::writev:return, syscall::pwrite:return
/self->inw && pid == $target && fds[self->fd].fi_fs == "zfs" && arg1 >= 0/
{
    @w = sum(arg1);
    self->inw = 0;
}

tick-1s
{
    printf("\nThroughput in bytes/second\n");
    printa("Read: %@d\n", @r);
    printa("Write: %@d\n", @w);
    trunc(@r);
    trunc(@w);
}

The above script watches for read/write related system calls being made on files that are in zfs file systems. For each read/write, it sums the number of bytes read/written in aggregates and prints them out once a second. Note that for other file systems (for instance, on BSD, the value of "zfs" will be different).

Here is output from the script.

# ./node_rwthruput.d -p 7767

Throughput in bytes/second
Throughput in bytes/second
Throughput in bytes/second
Throughput in bytes/second
Throughput in bytes/second
Write: 3056

Throughput in bytes/second
Write: 6112
(^c)
#

Using DTrace, we can drill down to find out what file(s) are being read/written as well as time distribution showing how long they are taking. We can also get stack backtraces showing where the reads/writes are occurring in the code.

Network Utilization

For network utilization, we'll look at the number of packets sent/received, and the number of bytes sent/received. To see this for each network device system wide, use netstat(1M):

# netstat -I net1 -i 1
    input   net1      output          input  (Total)    output
packets errs  packets errs  colls  packets  errs  packets errs  colls 
22055630 0     8531    0     0      29403866 0     316911  0     0     
19       0     0       0     0      27       0     1       0     0     
10       0     0       0     0      17       0     1       0     0     
8        0     0       0     0      14       0     1       0     0     
14       0     0       0     0      21       0     1       0     0     
11       0     0       0     0      16       0     1       0     0     
7        0     0       0     0      10       0     1       0     0
(^c)
#

Again, the first line can be ignored (it shows statistics since the system was booted). If you are running in the global zone, you can use the nicstat(1M) command to get throughput in both packets and data sent/received per second.

To watch network traffic for node processes, we'll use DTrace to watch I/O-related system calls on sockets. Note that this is not necessarily the number of packets sent/received, but is a reasonable approximation. An accurate count requires access to the kernel in the global zone.

#!/usr/sbin/dtrace -qs

syscall::read:entry,
syscall::readv:entry,
syscall::recv:entry
/pid == $target && fds[arg0].fi_fs == "sockfs"/
{
    @r = count();
}

syscall::write:entry,
syscall::writev:entry,
syscall::send:entry
/pid == $target && fds[arg0].fi_fs == "sockfs"/
{
    @w = count();
}

tick-1sec
{
    printf("%13s   %14s\n", "INPUT PACKETS", "OUTPUT PACKETS");
    printa("%@13d   %@14d\n", @r, @w);
    trunc(@r);
    trunc(@w);
}

And here is output from the script for a given node process.

# ./net_iops.d -p 82463
INPUT PACKETS   OUTPUT PACKETS
INPUT PACKETS   OUTPUT PACKETS
INPUT PACKETS   OUTPUT PACKETS
INPUT PACKETS   OUTPUT PACKETS
INPUT PACKETS   OUTPUT PACKETS
INPUT PACKETS   OUTPUT PACKETS
           14               22
INPUT PACKETS   OUTPUT PACKETS
          129              128
INPUT PACKETS   OUTPUT PACKETS
           57               50
INPUT PACKETS   OUTPUT PACKETS
INPUT PACKETS   OUTPUT PACKETS
INPUT PACKETS   OUTPUT PACKETS
(^c)
#

To see the throughput (i.e., the number of bytes read/written, we'll use:

#!/usr/sbin/dtrace -qs

syscall::read:entry,
syscall::readv:entry,
syscall::recv:entry,
syscall::recvfrom:entry
/pid == $target && fds[arg0].fi_fs == "sockfs"/
{
    self->rfd = arg0;
}

syscall::read:return,
syscall::readv:return,
syscall::recv:return,
syscall::recvfrom
/pid == $target && fds[self->rfd].fi_fs == "sockfs" && arg0 > 0/
{
    @r = sum(arg0);
}

syscall::write:entry,
syscall::writev:entry,
syscall::send:entry,
syscall::sendto:entry
/pid == $target && fds[arg0].fi_fs == "sockfs"/
{
    self->wfd = arg0;
}

syscall::write:return,
syscall::writev:return,
syscall::send:return,
syscall::sendto:return
/pid == $target && fds[self->wfd].fi_fs == "sockfs" && arg0 > 0/
{
    @w = sum(arg0);
}

tick-1sec
{
    printf("%13s   %14s\n", "INPUT BYTES", "OUTPUT BYTES");
    printa("%@13d   %@14d\n", @r, @w);
    trunc(@r);
    trunc(@w);
}

And here is output for the script.

# ./net_throughput.d -p 82463
  INPUT BYTES     OUTPUT BYTES
  INPUT BYTES     OUTPUT BYTES
  INPUT BYTES     OUTPUT BYTES
         4785            12412
  INPUT BYTES     OUTPUT BYTES
         7920            20544
  INPUT BYTES     OUTPUT BYTES
         3795             9844
  INPUT BYTES     OUTPUT BYTES
  INPUT BYTES     OUTPUT BYTES
(^c)
#

Note that there are DTrace probes built in to node itself that give you information about garbage collection, client and server requests and responses, and other information. See DTrace Probes for node for more information. You can also add your own DTrace probes to your node application (see node-dtrace-provider). And if you are using the node-restify, there are built-in DTrace probes for each route that your app specifies.

Memory Utilization

For system-wide memory utilization, you can use vmstat(1M). For memory utilization for a specific zone (a SmartOS virtualized instance), use zonememstat(1M). For memory utilization by a specific node process, use pmap(1). For instance:

# while true; do pmap -x 82463 | egrep 'Address|total'; sleep 1; done
 Address  Kbytes     RSS    Anon  Locked Mode   Mapped File
total Kb   51400   42048   32920       -
 Address  Kbytes     RSS    Anon  Locked Mode   Mapped File
total Kb   51400   42056   32920       -
(^c)
#

The important columns in the pmap(1) output are Kbytes and RSS. These are virtual address space size (in kbytes), and the Resident Set Size (amount in kbytes that is physically allocated in memory) respectively. If Kbytes grows over time, it may be a sign that you have a memory leak. But it could also simply mean that your node process is busy.

Note that for large RSS processes, doing this every second adds a lot of overhead. It is probably better once every 5 to 30 seconds.

To DTrace heap growth for a process, you can use:

# dtrace -n 'syscall::brk:entry/pid ==
      $target/{@[jstack(500,8000)]=count();} tick-60s{printa(@);
      trunc(@);}' -p `pgrep node`

Note that node (and other applications), can grow space on the heap using brk(2), but they can also grow space using mmap(2). See Brendan's blog: Memory Leak (and Growth) Flame Graphs and Bruning Questions: Where's My Memory Going? for details.

Conclusions

Using DTrace to get information about Node.js applications allows you to examine the entire software stack. It also works both at a macro level (see the entire system), and allows you to drill down to see individual processes at the instruction level.

Note: vmstat has been extensively covered by Brendan Gregg in a series of short videos:

:

Sign up Now for Instant Cloud Access

Get Started

View PricingSee Benchmarks