Bruning Questions: Where's My Memory Going?

March 15, 2013 - by Mr. Max Bruning

While teaching a combined SDC, SmartOS Internals, ZFS Internals, and DTrace course over the last 2 weeks, a student noted that he thought he was seeing a memory leak while running imgadm import dataset_uuid. He was doing this on his laptop on top of VirtualBox. The command eventually failed. So, the question he wanted to answer was "What is using up the memory"? We'll also take a look at how the command failed (it was not because of being out of space).

I was able to reproduce the behavior, and if you like, you can follow along by doing the same steps on your SmartOS system; you can get the latest version here. Here are good instructions for getting SmartOS running on VMware Fusion, and a link for doing the same on VirtualBox. If you prefer to run it on bare metal, you can get the USB image from https://download.joyent.com/pub/iso/latest-USB.img.bz2, copy the img to a USB key, and boot your system off of the USB key. There are also links for running on VMware, VirtualBox, or from the USB key on the download SmartOS wiki page.

Assuming you've logged into the global zone as root using ssh, we'll do the following:

# imgadm avail
UUID                                  NAME           VERSION  OS       PUBLISHED           
febaa412-6417-11e0-bc56-535d219f2590  smartos        1.3.12   smartos  2011-04-11T08:45:00Z
7456f2b0-67ac-11e0-b5ec-832e6cf079d5  nodejs         1.1.3    smartos  2011-04-15T22:04:12Z
41da9c2e-7175-11e0-bb9f-536983f41cd8  nodejs         1.1.4    smartos  2011-04-28T17:33:45Z
63ce06d8-7ae7-11e0-b0df-1fcf8f45c5d5  smartos        1.3.13   smartos  2011-05-10T09:25:00Z
184c9b38-ad3d-11e0-bad6-1b7240aaa5fc  smartos        1.3.15   smartos  2011-07-13T11:39:22Z
70856d1a-b860-11e0-9bbf-bf2a7494dd82  smartos        1.4.1    smartos  2011-07-27T15:17:41Z
7abad50e-b865-11e0-8462-730bcfdde020  smartos64      1.4.1    smartos  2011-07-27T15:33:26Z
a0f8cf30-f2ea-11e1-8a51-5793736be67c  standard64     1.0.7    smartos  2012-08-30T21:37:30Z
...
6344c3f4-64d9-11e2-af3d-af797e5b19ca  hadoop         1.0.0    smartos  2013-01-23T05:10:08Z
1327ed36-5130-11e2-95a8-9b5a153adf3e  ubuntu-12.04   2.3.1    linux    2013-01-24T21:31:37Z
46ecf60e-52c8-11e2-b212-9b51fc749547  debian-6.0.6   2.3.1    linux    2013-01-24T21:36:28Z
1d4fd806-6ef8-11e2-93ab-4f22d6d207c9  mongodb        1.4.5    smartos  2013-02-04T18:43:44Z
d2fc0a96-61d1-11e2-8701-c7c137935369  chefserver     1.0.0    smartos  2013-02-04T19:30:16Z
4bf9530a-7ae5-11e2-bb4e-3bad5fbc3de9  sngl           0.99.0   smartos  2013-02-19T19:43:00Z
03d515f2-872c-11e2-8314-8387b5044570  base           1.9.0    smartos  2013-03-07T15:11:00Z
bad2face-8738-11e2-ac72-0378d02f84de  base64         1.9.0    smartos  2013-03-07T15:14:30Z
#

The imgadm(1M) is explained here, and in the manual page (currently, only in SmartOS).

There are currently 105 different images available. My student tried with an ubuntu image, but we'll try a SmartOS image.

Just to make this clear, the command does not always fail. But while it is running, memory usage always increases.

Here is the command to import the dataset.

# imgadm import a0f8cf30-f2ea-11e1-8a51-5793736be67c
Importing image a0f8cf30-f2ea-11e1-8a51-5793736be67c (standard64 1.0.7) from "https://images.joyent.com"
0% [                             ]  time 2.0s  eta 406.2s
...

While the above is running, in another window we'll run vmstat(1M).

# vmstat 2
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr lf rm s0 s1   in   sy   cs us sy id
 0 0 0 3452628 1400208 724 1763 397 0 40 0 2052 36 -280 47 142 585 6864 2946 14 10 77
 0 0 0 3183736 1080148 1 21  0  0  0  0  0  0  0  0  0  734 2807 1687 21  7 72
 0 0 0 3180548 1076972 0  3  0  0  0  0  0  0  0 109 0  974 2198 2868 16 11 73
 0 0 0 3177216 1073668 0  1  0  0  0  0  0  0  0  0  0  908 2887 1473 19  7 74
 0 0 0 3174452 1070916 0  1  0  0  0  0  0  0  0  0  0  921 2572 1296 17  7 76
...

From the above output (as with most of the *stat commands, ignoring the first line of output), the free memory on the machine is steadily dropping.

Let's see if prstat(1M) tells us anything. Basically, the memory is either being used by a process(es), for files using tmpfs (i.e., files in /tmp), or by the kernel itself (very possibly by ZFS' ARC).

# prstat -s size -c
# prstat -s size -c
Please wait...
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP       
  3553 root      102M   92M run     59    0   0:00:08 7.2% imgadm/4
  2125 root       35M   18M sleep    1    0   0:00:00 0.0% metadata/3
  2489 root       33M   20M sleep   59    0   0:00:04 0.0% vmadmd/3
  2164 root       19M   10M sleep   55    0   0:00:01 0.0% fmd/26
    10 root       10M 8312K sleep   29    0   0:00:07 0.0% svc.configd/33
     8 root     7800K 5088K sleep   55    0   0:00:01 0.0% svc.startd/12
  3335 root     6976K 4208K sleep   53    0   0:00:07 1.9% sshd/1
  2522 root     6868K 2256K sleep    1    0   0:00:01 0.0% sendmail/1
  2432 smmsp    6568K 1560K sleep    1    0   0:00:00 0.0% sendmail/1
  3370 root     6384K 3700K sleep   59    0   0:00:00 0.0% sshd/1
  3504 root     6384K 3704K run      1    0   0:00:00 0.0% sshd/1
  2450 root     6380K 5136K sleep    1    0   0:00:00 0.0% intrd/1
  2158 root     6344K 3876K sleep   55    0   0:00:01 0.0% nscd/24
    55 root     5988K 2696K sleep   29    0   0:00:00 0.0% syseventd/19
   290 root     5960K 2804K sleep   29    0   0:00:00 0.0% syseventd/19
Total: 60 processes, 365 lwps, load averages: 0.53, 0.18, 0.07
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP       
  3553 root      102M   93M sleep   23    0   0:00:08 7.3% imgadm/4
  2125 root       35M   18M sleep    1    0   0:00:00 0.0% metadata/3
  2489 root       33M   20M sleep   59    0   0:00:04 0.0% vmadmd/3
  2164 root       19M   10M sleep   55    0   0:00:01 0.0% fmd/26
    10 root       10M 8312K sleep   29    0   0:00:07 0.0% svc.configd/33
     8 root     7800K 5088K sleep   55    0   0:00:01 0.0% svc.startd/12
  3335 root     6976K 4208K sleep   53    0   0:00:07 1.9% sshd/1
  2522 root     6868K 2256K sleep    1    0   0:00:01 0.0% sendmail/1
  2432 smmsp    6568K 1560K sleep    1    0   0:00:00 0.0% sendmail/1
  3370 root     6384K 3700K sleep   59    0   0:00:00 0.0% sshd/1
  3504 root     6384K 3704K sleep    1    0   0:00:00 0.0% sshd/1
  2450 root     6380K 5136K sleep    1    0   0:00:00 0.0% intrd/1
  2158 root     6344K 3876K sleep   55    0   0:00:01 0.0% nscd/24
    55 root     5988K 2696K sleep   29    0   0:00:00 0.0% syseventd/19
   290 root     5960K 2804K sleep   29    0   0:00:00 0.0% syseventd/19
Total: 60 processes, 365 lwps, load averages: 0.53, 0.18, 0.07
...

From the above output, process id 3553 (imgadm) is by far the largest process, and its RSS (Resident Set Size, i.e., amount of physical memory in use) is growing. A closer look at this process shows that it is the node.js engine:

# pgrep imgadm
3553
# pargs 3553
3553:        /usr/node/bin/node /usr/sbin/imgadm import a0f8cf30-f2ea-11e1-8a51-5793736be67c
argv[0]: /usr/node/bin/node
argv[1]: /usr/sbin/imgadm
argv[2]: import
argv[3]: a0f8cf30-f2ea-11e1-8a51-5793736be67c
# ptree 3553
2245  /usr/lib/ssh/sshd
  3334  /usr/lib/ssh/sshd
    3335  /usr/lib/ssh/sshd
      3338  -bash
        3553  /usr/node/bin/node /usr/sbin/imgadm import a0f8cf30-f2ea-11e1-8a5
          3557  /usr/bin/bzip2 -cdfq
          3558  /usr/sbin/zfs receive zones/a0f8cf30-f2ea-11e1-8a51-5793736be67
#

So, imgadm is a node.js script. Is there a problem with node? It turns out that node.js is built with libumem.so, so we can get a dump and see if there are possible memory leaks.

# pldd 3553
3553: /usr/node/bin/node /usr/sbin/imgadm import a0f8cf30-f2ea-11e1-8a51-579
/lib/libumem.so.1
/lib/libz.so.1
/lib/libssl.so.1.0.0
/lib/libcrypto.so.1.0.0
/lib/libkstat.so.1
/lib/libsocket.so.1
/lib/libnsl.so.1
/usr/lib/libstdc++.so.6.0.13
/lib/libm.so.2
/usr/lib/libgcc_s.so.1
/lib/libpthread.so.1
/usr/lib/libc/libc_hwcap1.so.1
/usr/node/0.8/node_modules/dtrace-provider.node
/lib/libumem_trampoline.so.1
# cd /var/tmp
# gcore 3553
gcore: core.3553 dumped
#

A quick look at the core file for possible memory leaks:

# mdb core.3553
Loading modules: [ libumem.so.1 libc.so.1 ld.so.1 ]
> ::findleaks -f -v
findleaks:                maximum buffers => 20613
findleaks:                 actual buffers => 19722
findleaks: 
findleaks:             potential pointers => 9391153
findleaks:                     dismissals => 3760417       (40.0%)
findleaks:                         misses => 3942678       (41.9%)
findleaks:                           dups => 1668382       (17.7%)
findleaks:                        follows => 19676         ( 0.2%)
findleaks: 
findleaks:              peak memory usage => 299 kB
findleaks:               elapsed CPU time => 0.5 seconds
findleaks:              elapsed wall time => 0.8 seconds
findleaks: 
CACHE     LEAKED   BUFFER CALLER
08619a10      35 08624b10 ?
08619e10       4 0862fce0 ?
0861a010       3 08623a08 ?
0861a410       2 0868dee8 ?
0861ba10       2 08698c40 ?
------------------------------------------------------------------------
   Total      46 buffers, 1688 bytes
>

So basically, node does not seem to be leaking memory (only 1688 bytes). If node was growing, and we did suspect a leak, we could use DTrace to try to determine where the leak is occuring.

# dtrace -q -n 'syscall::brk:entry/pid == 3363/{printf("brk(%lx) = ", arg0); self->in=1;} syscall::brk:return/self->in/{printf("%lx\n", arg1); self->in=0;}tick-5m{exit(0);}'
brk(cf02000) = 0
brk(d003000) = 0
brk(d104000) = 0
brk(d205000) = 0
brk(d306000) = 0
brk(d407000) = 0
#
# dtrace -q -n 'syscall::brk:entry/pid == 3553/{printf("brk(%lx) = ", arg0); self->in=1;} syscall::brk:return/self->in/{printf("%lx\n", arg1); ustack(); self->in=0;}tick-5m{exit(0);}'
brk(d8df000) = 0

              libc.so.1`_brk_unlocked+0x15
              libumem.so.1`vmem_sbrk_alloc+0xa1
              libumem.so.1`vmem_xalloc+0x45e
              libumem.so.1`vmem_alloc+0x19f
              libumem.so.1`vmem_xalloc+0x45e
              libumem.so.1`vmem_alloc+0x19f
              libumem.so.1`umem_alloc+0xf0
              libumem.so.1`malloc+0x36
              libstdc++.so.6.0.13`_Znwj+0x28
              libstdc++.so.6.0.13`_Znaj+0x1e
              node`_ZN4node6Buffer7ReplaceEPcjPFvS1_PvES2_+0xed
              node`_ZN4node6BufferC1EN2v86HandleINS1_6ObjectEEEj+0x97
              node`_ZN4node6Buffer3NewERKN2v89ArgumentsE+0x172
              node`_ZN2v88internalL30Builtin_HandleApiCallConstructENS0_12_GLOBAL__N_116BuiltinArgumentsILNS0_21BuiltinExtraArgumentsE1EEEPNS0_7IsolateE+0x1d6
              0xba00a336
              0xba021aef
              0x9f20b009
              0xba00db41
              0xba015e82
              0x9f20f940
#
# mdb /usr/node/bin/node
> ::dem node`_ZN4node6Buffer7ReplaceEPcjPFvS1_PvES2_
node`_ZN4node6Buffer7ReplaceEPcjPFvS1_PvES2_ == node`node::Buffer::Replace
>

So the node::Buffer::Replace method is calling malloc(), which in turn decided it needed more memory so it called brk(2). The brk(2) system call is used to grow the virtual size of the heap. But again, node is large, but most likely not leaking memory.

And the tmpfs file system does not appear to be using the memory:

# df -h | grep swap
swap                   2.1G   884K       2.1G     1%    /etc/svc/volatile
swap                   2.1G    12K       2.1G     1%    /tmp
swap                   2.1G    32K       2.1G     1%    /var/run
# swap -lh
swapfile             dev    swaplo   blocks     free
/dev/zvol/dsk/zones/swap 90,1        4K     2.0G     2.0G
# swap -sh
total: 171M allocated + 37M reserved = 208M used, 2.1G available
#

Let's take a look at the kernel. Each of the following commands were run a few minutes apart.

# echo ::memstat | mdb -k
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                     180073               703   35%
ZFS File Data              181992               710   35%
Anon                        49035               191    9%
Exec and libs                1282                 5    0%
Page cache                  16457                64    3%
Free (cachelist)             7591                29    1%
Free (freelist)             85440               333   16%

Total                      521870              2038
Physical                   521869              2038
# echo ::memstat | mdb -k
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                     181625               709   35%
ZFS File Data              186546               728   36%
Anon                        49028               191    9%
Exec and libs                1282                 5    0%
Page cache                  16456                64    3%
Free (cachelist)             7593                29    1%
Free (freelist)             79340               309   15%

Total                      521870              2038
Physical                   521869              2038
# echo ::memstat | mdb -k
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                     187335               731   36%
ZFS File Data              177882               694   34%
Anon                        49040               191    9%
Exec and libs                1284                 5    0%
Page cache                  16459                64    3%
Free (cachelist)             7326                28    1%
Free (freelist)             82544               322   16%

Total                      521870              2038
Physical                   521869              2038
#

The kernel is growing, but is not using an excessive amount of space. So, applications are growing, but not enough to cause problems. The same can be said for /tmp files, and for the kernel. The conclusion is that there is no memory leak.

My student thought a memory leak was causing the imgadm import to fail. Here is the failure message:

...
74% [=====================        ]  time 372.5s  eta 130.9s
{"name":"imgadm","hostname":"00-0c-29-4e-0c-94","pid":2891,"level":50,"cmd":"/usr/sbin/zfs destroy -r zones/1327ed36-5130-11e2-95a8-9b5a153adf3e-partial","error":{"killed":false,"code":1,"signal":null},"stdout":"","stderr":"cannot destroy 'zones/1327ed36-5130-11e2-95a8-9b5a153adf3e-partial': dataset is busy\ncannot destroy 'zones/1327ed36-5130-11e2-95a8-9b5a153adf3e-partial': dataset is busy\n","cleanDsName":"zones/1327ed36-5130-11e2-95a8-9b5a153adf3e-partial","msg":"error destroying tmp dataset while cleaning up","time":"2013-03-13T13:08:05.784Z","v":0}
imgadm: error (InternalError): uncompression error while importing: exit code 2
#

This occured on a system where network access was slow and not very reliable. Most likely, the failure is due not to a memory shortage, but to corruption of data received over the network. We may come back to this in a future blog.

To summarize what to look for when you suspect a memory leak:

  • First, does vmstat(1M) show free memory (and/or free swap) declining over time.
  • If so, use prstat(1M) to check for large, growing processes.
  • If no processes are large and growing, check tmpfs files (/tmp).
  • If no large files under tmpfs, check the kernel.
  • If the kernel is fine, no large (and growing) tmpfs files, and no large and growing processes, you probably don't have a leak.

Note that files that have been unlinked but are still open can cause memory to be used up if these files are being written to and are in a tmpfs file system. How to deal with this will be covered in a future blog.

You can also submit your DTrace, MDB, or SmartOS questions to be answered in a future column by emailing MrBruning@joyent.com, ask at #BruningQuestions, or attend one of my upcoming courses.

:

Sign up Now for Instant Cloud Access

Get Started

View PricingSee Benchmarks