Watch it Wednesdays: Surge 2012 - When Node.js Goes Wrong

February 13, 2013 - by joyentdeirdre

The event-oriented approach underlying Node.js enables significant concurrency using a deceptively simple programming model, which has been an important factor in Node's growing popularity for building large scale web services. But what happens when these programs go sideways? Even in the best cases, when such issues are fatal, developers have historically been left with just a stack trace. Subtler issues, including latency spikes (which are just as bad as correctness bugs in the real-time domain where Node is especially popular) and other buggy behavior often leave even fewer clues to aid understanding.

In this talk, we will discuss the issues we encountered in debugging Node.js in production, focusing upon the seemingly intractable challenge of extracting runtime state from the black hole that is a modern JIT'd VM. We will describe the tools we've developed for examining this state, which operate on running programs (via DTrace), as well as VM core dumps (via a postmortem debugger). Finally, we will describe several nasty bugs we encountered in our own production environment.