Tracing Node.js add-on latency

Node.js has some built-in DTrace probes for looking at HTTP request start/done and GC start/done, and I wrote nhttpsnoop to report latency based on these probes. There's also a future project that will make it easy to trace any asynchronous operations. But in the meantime, there's one category of latency that's missing, which is latency of asynchronous operations kicked off by add-ons.

As an example, I'll use node-sqlite3. Below, I have sqlite-demo.js, which is a version of the demo program that's been modified to run in a loop where it opens the database, inserts some records, selects some records, and closes the database. As this program runs, the add-on kicks off async operations to open the database, run queries, and close the database. We're going to trace those operations to see how long each type of operation takes.

Below, I have trace-addon-latency.d, a D script that traces the libuv uv_queue_work function as well as all functions in all add-ons in the target Node process. This is a little heavy-handed for what we're trying to do, and can have a noticeable impact on program performance, depending on the add-on. For many of the small add-ons I use, the impact is negligible.

By running the program and the D script for a few minutes, then stopping the D script, you get output that looks like this:

  node_sqlite3.node`node_sqlite3::Database::Work_BeginExec(node_sqlite3::Database::Baton*)           value  ------------- Distribution ------------- count                2048 |                                         0                    4096 |@@@@@@@@@@@@@@@@@                        6018                 8192 |@@@@@                                    1590                16384 |@                                        302                 32768 |@@@@@@@@@@@@@@@@@                        5722                65536 |                                         156                131072 |                                         0        

There's a histogram like this for each "kind" of asynchronous operation that each add-on kicks off. By "kind", I'm referring to the C++ function that kicks off the operation. In this case, Work_BeginExec kicked off 6018 operations that took between 4us and 8us.

This is a quick way to get a look at what async operations your Node program is doing (from add-ons) and how long they're taking. This works on OS X and SmartOS, and should work on BSD and any other system with DTrace.

Full steps to reproduce

$ mkdir demo$ cd demo$ npm install sqlite3$ curl -O https://gist.github.com/davepacheco/104e29e6aa2a5a3c1dca/raw/bbcd859c7469ed20b7126c56f3a5c45dc43986e1/sqlite-demo.js$ curl -O https://gist.github.com/davepacheco/104e29e6aa2a5a3c1dca/raw/45c7e0da1b5aeb0cced2163e525e7caffb1a6ebe/trace-addon-latency.d$ node sqlite-demo.js &$ dtrace -s trace-addon-latency.d -p $!     # you may need "sudo" here, particularly on OS X

Let that run for a little while, then kill either one of the "node" or "dtrace" processes. (Killing the "node" one causes "dtrace" to exit, since it was attached to it.) That will cause "dtrace" to exit and print the results.

sqlite-demo.js

var sqlite3 = require('sqlite3').verbose();go();function go(){    var db = new sqlite3.Database(':memory:');    db.serialize(function () {        db.run("CREATE TABLE lorem (info TEXT)");        var stmt = db.prepare("INSERT INTO lorem VALUES (?)");        for (var i = 0; i < 10; i++) {            stmt.run("Ipsum " + i);        }        stmt.finalize();        db.exec("SELECT rowid AS id, info FROM lorem", function () {            db.close();            go();        });    });}

trace-addon-latency.d

#!/usr/sbin/dtrace -s/* * trace-addon-latency.d: Run this script as: * *     dtrace -s trace-addon-latency.d -p PID * * to trace the latency of asynchronous operations kicked off by all add-ons in * the node process PID.  NOTE: this will instrument all native functions in all * add-ons, which can have a noticeable performance impact, depending on the * add-on. */pid$target::uv_queue_work:entry{    callers[arg1] = ucaller;    donefunc[arg1] = arg3;    starttime[arg1] = timestamp;}pid$target:*.node::entry/donefunc[arg0]/{    @[ufunc(callers[arg0])] = quantize(timestamp - starttime[arg0]);    callers[arg0] = 0;    starttime[arg0] = 0;    donefunc[arg0] = 0;}

trace-addon-latency.out

  node_sqlite3.node`node_sqlite3::Database::Work_BeginClose(node_sqlite3::Database::Baton*)           value  ------------- Distribution ------------- count                2048 |                                         0                    4096 |                                         167                  8192 |@@@@@@@@@@@@@@@@@@@@@@@@@                8529                16384 |@@@@@@@                                  2326                32768 |@@@@@@@@                                 2740                65536 |                                         26                 131072 |                                         0          node_sqlite3.node`node_sqlite3::Database::Work_BeginExec(node_sqlite3::Database::Baton*)           value  ------------- Distribution ------------- count                2048 |                                         0                    4096 |@@@@@@@@@@@@@@@@@                        6018                 8192 |@@@@@                                    1590                16384 |@                                        302                 32768 |@@@@@@@@@@@@@@@@@                        5722                65536 |                                         156                131072 |                                         0          node_sqlite3.node`node_sqlite3::Database::Work_BeginOpen(node_sqlite3::Database::Baton*)           value  ------------- Distribution ------------- count                2048 |                                         0                    4096 |@                                        286                  8192 |@@@@@@@@@@@@@@@@@@                       6064                16384 |@@                                       642                 32768 |@@@@@@@@@@@@@@@@@@@                      6701                65536 |                                         94                 131072 |                                         1                  262144 |                                         0          node_sqlite3.node`node_sqlite3::Statement::Work_BeginPrepare(node_sqlite3::Database::Baton*)           value  ------------- Distribution ------------- count                2048 |                                         0                    4096 |@@@@@                                    3604                 8192 |@@@@@@@@@@@@                             8376                16384 |@@                                       1341                32768 |@@@@@@@@@@@@@@@@                         10831               65536 |@@@@@                                    3423               131072 |                                         2                  262144 |                                         0          node_sqlite3.node`node_sqlite3::Statement::Work_BeginRun(node_sqlite3::Statement::Baton*)           value  ------------- Distribution ------------- count                2048 |                                         0                    4096 |@@@@@@@@@@@@@@@@@@@@@@@                  87884                8192 |@@@@@                                    19478               16384 |@@                                       8692                32768 |@@@@@@@@                                 32160               65536 |@                                        3455               131072 |                                         0      


Post written by David Pacheco