Tracing Node.js add-on latency
Node.js has some built-in DTrace probes for looking at HTTP request start/done and GC start/done, and I wrote nhttpsnoop to report latency based on these probes. There's also a future project that will make it easy to trace any asynchronous operations. But in the meantime, there's one category of latency that's missing, which is latency of asynchronous operations kicked off by add-ons.
As an example, I'll use node-sqlite3. Below, I have sqlite-demo.js, which is a version of the demo program that's been modified to run in a loop where it opens the database, inserts some records, selects some records, and closes the database. As this program runs, the add-on kicks off async operations to open the database, run queries, and close the database. We're going to trace those operations to see how long each type of operation takes.
Below, I have trace-addon-latency.d, a D script that traces the libuv uv_queue_work
function as well as all functions in all add-ons in the target Node process. This is a little heavy-handed for what we're trying to do, and can have a noticeable impact on program performance, depending on the add-on. For many of the small add-ons I use, the impact is negligible.
By running the program and the D script for a few minutes, then stopping the D script, you get output that looks like this:
node_sqlite3.node`node_sqlite3::Database::Work_BeginExec(node_sqlite3::Database::Baton*) value ------------- Distribution ------------- count 2048 | 0 4096 |@@@@@@@@@@@@@@@@@ 6018 8192 |@@@@@ 1590 16384 |@ 302 32768 |@@@@@@@@@@@@@@@@@ 5722 65536 | 156 131072 | 0
There's a histogram like this for each "kind" of asynchronous operation that each add-on kicks off. By "kind", I'm referring to the C++ function that kicks off the operation. In this case, Work_BeginExec kicked off 6018 operations that took between 4us and 8us.
This is a quick way to get a look at what async operations your Node program is doing (from add-ons) and how long they're taking. This works on OS X and SmartOS, and should work on BSD and any other system with DTrace.
Full steps to reproduce
$ mkdir demo$ cd demo$ npm install sqlite3$ curl -O https://gist.github.com/davepacheco/104e29e6aa2a5a3c1dca/raw/bbcd859c7469ed20b7126c56f3a5c45dc43986e1/sqlite-demo.js$ curl -O https://gist.github.com/davepacheco/104e29e6aa2a5a3c1dca/raw/45c7e0da1b5aeb0cced2163e525e7caffb1a6ebe/trace-addon-latency.d$ node sqlite-demo.js &$ dtrace -s trace-addon-latency.d -p $! # you may need "sudo" here, particularly on OS X
Let that run for a little while, then kill either one of the "node" or "dtrace" processes. (Killing the "node" one causes "dtrace" to exit, since it was attached to it.) That will cause "dtrace" to exit and print the results.
sqlite-demo.js
var sqlite3 = require('sqlite3').verbose();go();function go(){ var db = new sqlite3.Database(':memory:'); db.serialize(function () { db.run("CREATE TABLE lorem (info TEXT)"); var stmt = db.prepare("INSERT INTO lorem VALUES (?)"); for (var i = 0; i < 10; i++) { stmt.run("Ipsum " + i); } stmt.finalize(); db.exec("SELECT rowid AS id, info FROM lorem", function () { db.close(); go(); }); });}
trace-addon-latency.d
#!/usr/sbin/dtrace -s/* * trace-addon-latency.d: Run this script as: * * dtrace -s trace-addon-latency.d -p PID * * to trace the latency of asynchronous operations kicked off by all add-ons in * the node process PID. NOTE: this will instrument all native functions in all * add-ons, which can have a noticeable performance impact, depending on the * add-on. */pid$target::uv_queue_work:entry{ callers[arg1] = ucaller; donefunc[arg1] = arg3; starttime[arg1] = timestamp;}pid$target:*.node::entry/donefunc[arg0]/{ @[ufunc(callers[arg0])] = quantize(timestamp - starttime[arg0]); callers[arg0] = 0; starttime[arg0] = 0; donefunc[arg0] = 0;}
trace-addon-latency.out
node_sqlite3.node`node_sqlite3::Database::Work_BeginClose(node_sqlite3::Database::Baton*) value ------------- Distribution ------------- count 2048 | 0 4096 | 167 8192 |@@@@@@@@@@@@@@@@@@@@@@@@@ 8529 16384 |@@@@@@@ 2326 32768 |@@@@@@@@ 2740 65536 | 26 131072 | 0 node_sqlite3.node`node_sqlite3::Database::Work_BeginExec(node_sqlite3::Database::Baton*) value ------------- Distribution ------------- count 2048 | 0 4096 |@@@@@@@@@@@@@@@@@ 6018 8192 |@@@@@ 1590 16384 |@ 302 32768 |@@@@@@@@@@@@@@@@@ 5722 65536 | 156 131072 | 0 node_sqlite3.node`node_sqlite3::Database::Work_BeginOpen(node_sqlite3::Database::Baton*) value ------------- Distribution ------------- count 2048 | 0 4096 |@ 286 8192 |@@@@@@@@@@@@@@@@@@ 6064 16384 |@@ 642 32768 |@@@@@@@@@@@@@@@@@@@ 6701 65536 | 94 131072 | 1 262144 | 0 node_sqlite3.node`node_sqlite3::Statement::Work_BeginPrepare(node_sqlite3::Database::Baton*) value ------------- Distribution ------------- count 2048 | 0 4096 |@@@@@ 3604 8192 |@@@@@@@@@@@@ 8376 16384 |@@ 1341 32768 |@@@@@@@@@@@@@@@@ 10831 65536 |@@@@@ 3423 131072 | 2 262144 | 0 node_sqlite3.node`node_sqlite3::Statement::Work_BeginRun(node_sqlite3::Statement::Baton*) value ------------- Distribution ------------- count 2048 | 0 4096 |@@@@@@@@@@@@@@@@@@@@@@@ 87884 8192 |@@@@@ 19478 16384 |@@ 8692 32768 |@@@@@@@@ 32160 65536 |@ 3455 131072 | 0
Post written by David Pacheco