MDB and Node.js

This post is an updated and expanded version of one I posted in 2011. That post was written for an early prototype of the MDB Node.js support. Much has changed since then. This version describes the current command set and also includes much more detail about using MDB for Node.js.

MDB is the illumos modular debugger. MDB is widely used on illumos systems (including SmartOS) to debug the live kernel, kernel crash dumps, processes, and process core dumps. In this post, I'll describe MDB's support for debugging Node.js programs.

MDB's focus is on inspecting program state -- and letting application developers build custom tools to inspect their programs -- rather than on source code analysis and control flow. While MDB supports breakpoints and single-stepping through assembly, it doesn't support JavaScript (or even C) source-level debugging. What it can do is show you all of the state in your program: JavaScript objects.

Why MDB?

While console.log debugging is effective for reproducible problems in controlled environments, it doesn't work very well in production for a bunch of good reasons:

  • Each lap is expensive. Even with a nimble deployment process, it takes time to get new code deployed to production. If you're skipping deployment and editing live code, you risk taking down your service with a typo, leaving code running in production that's not checked into the repo, or forgetting to update other instances of the service.
  • Each lap is disruptive. You usually have to restart a service to pick up code changes, and that's usually visible as downtime, however brief, to users of your service. (If you use dynamic monkey patching to avoid this, you risk other, more difficult-to-debug issues related to having run an untested combination of components in the same process.)
  • Production problems are often hard to reproduce. Problems in distributed systems are especially hard to reproduce on the instance that you're debugging.
  • Worse, restarts often make the original problem go away. Many problems result from bad in-memory state that's recreated on restart, so if you restart the service, you lose all hope of debugging it.

So it's very useful to be able to debug a running program while it's broken. Besides using MDB, other popular techniques include building a REPL into your program or some other way to pull out program state. These are extremely useful, but have their own limitations:

  • They don't work to debug crashes, since the program's gone at that point.
  • They usually don't work to debug infinite loops, since the program's stuck.
  • They can't show you all of the state in your program, like the files it hasopen, JavaScript objects that aren't referenced by these tools, or theC-level state used by native add-ons.

Getting started

Prerequisites

All of the following works out-of-the-box on recent SmartOS builds, as well as SmartOS instances provisioned on the Joyent Public Cloud.

You can also run the Node example on a GNU/Linux system, but you'll have to save a core file and run the MDB parts on SmartOS (e.g., using mlogin on Manta).

Working through an example

Here's an example you can follow along with. My program is called loop.js:

/* * This script demonstrates a simple stack trace involving an anonymous * function, then goes into an infinite loop so you can catch it with the * debugger.  One caveat: we grab a stack trace at the start to force V8 to * compute line number information for this script.  If we didn't have this, the * debugger would only print out position-in-file information. */new Error().stack;function main(arg) { func1(); }function func1() { func2(); }function func2(){        (function () {                for (;;)                        ;        })();}main({ 'hello': 'world' });

Run this program in the background with node v0.10.24:

$ node -vv0.10.24$ node loop.js &[2] 17334$

and attach MDB to that process (pid 17334):

$ mdb -p 17334Loading modules: [ ld.so.1 libumem.so.1 libc.so.1 ]>

When you attach MDB to a process like this, it stops the process and gives youa prompt. ::status is a good place to start (most MDB commands start with::):

> ::statusdebugging PID 17334 (32-bit)file: /home/dap/node-v0.10.24-sunos-x86/bin/nodethreading model: native threadsstatus: stopped by debugger

Don't worry if you don't know what all that means. MDB is telling you that it'sattached to pid 17334, which is a 32-bit process, and it's currently stopped.

In order to get any of the Node-specific debugger commands, you have to run::load v8:1

> ::load v8V8 version: 3.14.5.9Autoconfigured V8 support from targetC++ symbol demangling enabled

Now you can get a combined JavaScript/C stack trace with ::jsstack:

> ::jsstack8047620 0xafa59dee  (as ) (a0784dc5)    file: /home/dap/demo/loop.js    posn: line 15804763c 0xafa59951 func2 (a0781379)8047654 0xafa597eb func1 (a0781355)804766c 0xafa5974b main (a0781331)804768c 0xafa53c9c  (as ) (a0781285)80476c0 0xafa141a5 80476f8 0xafa53697  (as Module._compile) (8e22e04d)8047718 0xafa4f799  (as Module._extensions..js) (8e22e08d)804773c 0xafa4c73f  (as Module.load) (8e22e005)804777c 0xafa3fa37  (as Module._load) (8e22dfbd)804779c 0xafa3f593  (as Module.runMain) (8e22e0d5)80477d0 0xafa298f6 startup (a072461d)80477ec 0xafa2910b  (as ) (a0723f95)804780c 0xafa0fe19 8047848 0xafa0a0aa 80478c8_ZN2v88internalL6InvokeEbNS0_6HandleINS0_10JSFunctionEEENS1_INS0_6ObjectEEEiPS5_Pb+0x1018047908 v8::internal::Execution::Call+0xc98047968 v8::Function::Call+0x10b80479d8 node::Load+0x14e8047a38 node::Start+0x1588047a58 main+0x1b8047a70 _start+0x83

You can ignore the first column, as well as the pointers that start with0x.2 The top frame is what you'd expect: it's our anonymousfunction, created at line 15. That was called by func2, which was called byfunc1, which was called by main. After that you see the JavaScript and Cframes that are part of Node itself.

You can get more information about the whole stack using ::jsstack -v:

> ::jsstack -v8047620 0xafa59dee  (as ) (a0784dc5)    file: /home/dap/demo/loop.js    posn: line 15804763c 0xafa59951 func2 (a0781379)    file: /home/dap/demo/loop.js    posn: line 138047654 0xafa597eb func1 (a0781355)    file: /home/dap/demo/loop.js    posn: line 11804766c 0xafa5974b main (a0781331)    file: /home/dap/demo/loop.js    posn: line 10    arg1: a0784db5 (JSObject)804768c 0xafa53c9c  (as ) (a0781285)    file: /home/dap/demo/loop.js    posn: line 1    arg1: a077f6cd (JSObject)    arg2: a0780ae1 (JSFunction)    arg3: a077f651 (JSObject)    arg4: a077e091 (ConsString)    arg5: a0780f95 (ConsString)80476c0 0xafa141a5 80476f8 0xafa53697  (as Module._compile) (8e22e04d)    file: module.js    posn: line 374    arg1: a07807f9 (SeqAsciiString)    arg2: a077e091 (ConsString)8047718 0xafa4f799  (as Module._extensions..js) (8e22e08d)    file: module.js    posn: line 472    arg1: a077f651 (JSObject)    arg2: a077e091 (ConsString)804773c 0xafa4c73f  (as Module.load) (8e22e005)    file: module.js    posn: line 346    arg1: a077e091 (ConsString)804777c 0xafa3fa37  (as Module._load) (8e22dfbd)    file: module.js    posn: line 275    arg1: a0746315 (ConsString)    arg2: 8e208081 (Oddball: "null")    arg3: 8e2080a1 (Oddball: "true")804779c 0xafa3f593  (as Module.runMain) (8e22e0d5)    file: module.js    posn: line 49580477d0 0xafa298f6 startup (a072461d)    file: node.js    posn: line 3080477ec 0xafa2910b  (as ) (a0723f95)    file: node.js    posn: line 27    arg1: a071e9a9 (JSObject)804780c 0xafa0fe19 8047848 0xafa0a0aa 80478c8 _ZN2v88internalL6InvokeEbNS0_6HandleINS0_10JSFunctionEEENS1_INS0_6ObjectEEEiPS5_Pb+0x1018047908 v8::internal::Execution::Call+0xc98047968 v8::Function::Call+0x10b80479d8 node::Load+0x14e8047a38 node::Start+0x1588047a58 main+0x1b8047a70 _start+0x83

Now with each of the JavaScript stack frames, you have the filename and linenumber where the function was defined, along with the arguments it was passed.You can print these arguments by taking the pointer value (the hex number) andusing ::jsprint. Here's how you'd print arg1 to our main() function (thefourth frame down):

> a0784db5::jsprint{    hello: "world",}

which is exactly what we passed.

Finding JavaScript objects

MDB can find any JavaScript object in your program. It does this by scanningthe entire JavaScript heap and classifying objects by their properties. You dothis with ::findjsobjects. This may take a minute, then it prints out asummary of what it found:

> ::findjsobjects  OBJECT #OBJECTS   #PROPS CONSTRUCTOR: PROPS8e2254c9        1        0 Buffera071f289        1        0  (as )8e21a321        1        0  (as d)8e2154a9        1        0 MathConstructor8e209b31        1        0 JSON8e22d401        1        0 Modulea0708081       26        0a0715901       35        0 Array8e2093cd      329        0 Objecta074a61d        1        1 Object: ...a074a579        1        1 Object: /home/dap/demo/loop.jsa0711295        1        1 Arguments: length...9c912f19        2        5 Object: evals, natives, buffer, fs, constants9c9109d5        2        5 Object: cflags, default_configuration, defines, ...a074aea5        1       11 Array9c90ab55        1       11 process: version, moduleLoadList, versions, ...9c90aae9        2        6 Object: debug, uv, ipv6, tls_npn, tls_sni, tls9c90dadd        1       13 Object: _debugger, _linklist, assert, buffer, ...9c910bcd        1       13 Object: clang, gcc_version, host_arch, ...9c912ef5        1       13 Object: O_RDONLY, O_WRONLY, O_RDWR, S_IFMT, ...9c9148f9        2        7 Module: id, exports, parent, filename, loaded, ...a0718911        1       16 Array9c90a885        2        8 Object: http_parser, node, v8, ares, uv, zlib, ...9c9148b1        2       19 process: version, moduleLoadList, versions, ......
Each row describes a group of objects that have the same set of properties, so
:findjsobjects groups them together. The first column in each row is a pointerto a representative object for that group. You can print those out with

:jsprint. For example, this one appears to be used internally by Node:

a074a579::jsprint{/home/dap/demo/loop.js: {id: ".",exports: [...],parent: null,filename: "/home/dap/demo/loop.js",loaded: false,children: [...],paths: [...],},}

Once you've run "::findjsobjects" once, it keeps track of what it found beforeso that you can quickly look for more specific kinds of objects. You usuallysearch by a property name. For example, if I wanted to find my "hello world"object (and I didn't know it was on the stack), I could find it with:

> ::findjsobjects -p helloa0784d79

and print it out:

> a0784d79::jsprint{    hello: "world",}

This is a good time to point out that MDB supports pipelines, similar to mostshells. So you can combine these two commands:

> ::findjsobjects -p hello | ::jsprint{    hello: "world",}

Dealing with garbage

The mechanism that "::findjsobjects" uses means that it also tends to pick up"garbage" objects that used to exist in your program, but which are no longervalid. In my process, I can see some of these when I try to look for Node'sglobal "process" object, which I know has a versions property:

> ::findjsobjects -p versions | ::jsprintmdb: 8e21f06d: unknown JavaScript object type "AccessorInfo"{    version: 11,    moduleLoadList: 0,    versions: "title",    arch: 1572995,    platform: ,    argv: "version",    execArgv: 2621697,    env: 0,    pid: "moduleLoadList",    features: 1835393,}...

You can recognize garbage objects because you may see warnings when MDB printsthem out (e.g., "unknown JavaScript object type") and the properties lookvery wrong. In this case, we know the "versions" property is supposed to be anobject, not the string "title".

These garbage objects are annoying, but they're generally easily distinguishedfrom valid objects.

Representative objects

Note that "::findjsobjects" only prints out the representative object fromeach group of objects. Since you might have tens of thousands of instances ofthe same thing, it doesn't print all those by default. You have to pipe to::findjsobjects again to get that. If you wanted to actually find the global"process" object, the reliable way to do that is:

> ::findjsobjects -p versions | ::findjsobjects | ::jsprint...{    version: "v0.10.24",    moduleLoadList: [        "Binding evals",        "Binding natives",        "NativeModule events",...    ],    versions: {        http_parser: "1.0",        node: "0.10.24",        v8: "3.14.5.9",        ares: "1.9.0-DEV",        uv: "0.10.21",        zlib: "1.2.3",        modules: "11",        openssl: "1.0.1e",    },    arch: "ia32",    platform: "sunos",    argv: [        "node",        "/home/dap/demo/loop.js",    ],...}

I clipped the output for brevity. There were a few garbage objects, and the"process" object itself has a lot of stuff in it.

Where to go from here

This example demonstrated a bunch of the basics around using MDB to inspectprogram state. The best way to get familiar with these commands is to use themon your own programs, ideally to debug real problems. But even if you don'thave a real problem, try creating a core file of a production service and pokingaround. You might be surprised at what you find.

Debugging actual issues by inspecting state is very different than bothconsole.log debugging and tracing execution with breakpoints. It's much morepowerful in some ways, since it shows you much more information and doesn'tdisrupt the running program. When debugging problems this way, the approachmirrors the scientific method:

  • Form a hypothesis about what happened.
  • Inspect the state of the core file to confirm or disprove the hypothesis.
  • If proved, you're done. If not, refine the hypothesis and repeat.

The hard part is learning to think about problems this way and to developcreative ways to prove or disprove hypotheses. Sometimes, it's easy: if youthink some socket got closed, you can find the socket object, inspect its state,and test that hypothesis. Sometimes, it's not so easy: if you think aparticular function may have been called that shouldn't have been, it may not beeasy to confirm that. But be creative: maybe the function doesn't set anyvariables, but does it instantiate any objects, even temporary ones? You may beable to find them with "::findjsobjects".

One nice thing about this approach is that by the end, you can be confident inyour root-cause, because you've got lots of data to back it up. Often, the datamay point to a very specific combination of events (e.g., this kind of requesthappening concurrently with that other kind of request) that may help youreproduce the problem in development, so you can be sure that your fix works.

Building programs for debugging

You can help yourself by including state in your program that only exists tohelp debug it. For example, thevasync library providesfamiliar control-flow functions (like pipeline and forEachParallel), withthe added benefit that the state is stored in a JavaScript object (rather thanlocal variables contained in a closure). The upshot is that when your Nodeprogram hangs while running a pipeline of 10 functions, it's easy to use"::findjsobjects" to find out exactly which function didn't complete.

Other tips:

  • Save a record of all outstanding operations. Keep them in an object indexedby request id.
  • Save a timestamp when you start each operation, particularly outgoingrequests. This helps you determine whether a request has been hung (i.e., theexternal service is hung).

More examples

These are some examples of using MDB to analyze complex issues in Node itself:

Notes

Debugging crashes in Node programs

With Node v0.10.8 and later, you can use the --abort-on-uncaught-exceptionflag to cause your Node program to dump core when any uncaught exception isthrown. This way, when your program would have crashed, you instead get a coredump that you can debug with MDB.

Line numbering

The line numbers reported by MDB describe where the function was defined, notwhere the function called the next function in the stack.

In some cases, instead of seeing a line number, you may see a position number,as in "position 13546". MDB can't always tell what line number a positioncorresponds to. It can only tell when V8 computed this before the core file wascreated. In this case, the function would have been defined position 13546 inthe file. In "vim", you can find this with :goto 13546. Beware that Nodeprepends about 50 characters to all files before passing them to V8, so theseposition numbers appear to be off by about that much.

Processes or core files?

MDB operates on either running processes or core dumps. In turn, core dumpscan either be generated automatically in response to a crash, or on-demandusing gcore. The MDB commands and output arebasically the same whether you're debugging a live process or a core file --it's just the way you start up MDB that's different.

To debug a core dump, use:

$ mdb /path/to/core/file

To debug a process, use:

$ mdb -p PID

Once MDB is started, the rest of the commands are the same.

When you use mdb -p, MDB stops the process. Be careful doing this inproduction! It's usually better to create a core file first, and then debugthat:

$ gcore PIDcore.PID dumped$ mdb core.PID

This still pauses the process for as long as it takes gcore to run.

Special stack frames

There are a bunch of V8-specific frames you may see that you can usually ignore.These include:

  • ArgumentsAdapterFrame: inserted by V8 when a function's caller passes adifferent number of arguments than the function is defined to take.
  • ConstructFrame: used when invoking a constructor
  • EntryFrame: transition from C to JavaScript
  • ExitFrame: transition from JavaScript to C

Other frames include InternalFrames, EntryConstructFrames, and so on.

What's this "hole" object?

In ::jsprint output, you may see values called hole. You can think of"hole" as undefined. You usually see it in arrays where the array itselfhasn't been fully populated.

JavaScript requires that V8 be able to distinguish between array elements thathave not been assigned a value and those that have been assigned"undefined".3 V8 uses the special "hole" value for this internally.

For more help

On the specific MDB commands that work on Node, see:

> ::dmods -l v8

to list commands, and use ::help CMD for more information about each command.

On MDB, check out the MDB man page and theModular Debugger guide. Be awarethat MDB is also used as a kernel and low-level C debugger, so much of thematerial may be too low-level for Node.

Footnotes

1The debugger module is called "v8" because it's actually notNode-specific: everything it supports would work on other environments usingthe V8 VM.

2The first two columns of ::jsstack output are the frame pointerand instruction pointer for each stack frame. These can be useful whendigging into VM internals, or looking at native code, but can usually beignored for debugging JavaScript-level issues.

3You can see the difference yourself:

$ node> var x = new Array(3)undefined> x[0] = undefinedundefined> x[2] = undefinedundefined> x[ undefined, , undefined ]> 0 in xtrue> 1 in xfalse> 2 in xtrue


Post written by David Pacheco