Tracing Node.js add-on latency

October 17, 2013 - by David Pacheco

Node.js has some built-in DTrace probes for looking at HTTP request start/done and GC start/done, and I wrote nhttpsnoop to report latency based on these probes. There's also a future project that will make it easy to trace any asynchronous operations. But in the meantime, there's one category of latency that's missing, which is latency of asynchronous operations kicked off by add-ons.

As an example, I'll use node-sqlite3. Below, I have sqlite-demo.js, which is a version of the demo program that's been modified to run in a loop where it opens the database, inserts some records, selects some records, and closes the database. As this program runs, the add-on kicks off async operations to open the database, run queries, and close the database. We're going to trace those operations to see how long each type of operation takes.

Below, I have trace-addon-latency.d, a D script that traces the libuv uv_queue_work function as well as all functions in all add-ons in the target Node process. This is a little heavy-handed for what we're trying to do, and can have a noticeable impact on program performance, depending on the add-on. For many of the small add-ons I use, the impact is negligible.

By running the program and the D script for a few minutes, then stopping the D script, you get output that looks like this:

  node_sqlite3.node`node_sqlite3::Database::Work_BeginExec(node_sqlite3::Database::Baton*)
           value  ------------- Distribution ------------- count    
            2048 |                                         0        
            4096 |@@@@@@@@@@@@@@@@@                        6018     
            8192 |@@@@@                                    1590     
           16384 |@                                        302      
           32768 |@@@@@@@@@@@@@@@@@                        5722     
           65536 |                                         156      
          131072 |                                         0        

There's a histogram like this for each "kind" of asynchronous operation that each add-on kicks off. By "kind", I'm referring to the C++ function that kicks off the operation. In this case, Work_BeginExec kicked off 6018 operations that took between 4us and 8us.

This is a quick way to get a look at what async operations your Node program is doing (from add-ons) and how long they're taking. This works on OS X and SmartOS, and should work on BSD and any other system with DTrace.

Full steps to reproduce

$ mkdir demo
$ cd demo
$ npm install sqlite3
$ curl -O https://gist.github.com/davepacheco/104e29e6aa2a5a3c1dca/raw/bbcd859c7469ed20b7126c56f3a5c45dc43986e1/sqlite-demo.js
$ curl -O https://gist.github.com/davepacheco/104e29e6aa2a5a3c1dca/raw/45c7e0da1b5aeb0cced2163e525e7caffb1a6ebe/trace-addon-latency.d
$ node sqlite-demo.js &
$ dtrace -s trace-addon-latency.d -p $!     # you may need "sudo" here, particularly on OS X

Let that run for a little while, then kill either one of the "node" or "dtrace" processes. (Killing the "node" one causes "dtrace" to exit, since it was attached to it.) That will cause "dtrace" to exit and print the results.

sqlite-demo.js

var sqlite3 = require('sqlite3').verbose();
go();

function go()
{
    var db = new sqlite3.Database(':memory:');
    db.serialize(function () {
        db.run("CREATE TABLE lorem (info TEXT)");
        var stmt = db.prepare("INSERT INTO lorem VALUES (?)");
        for (var i = 0; i < 10; i++) {
            stmt.run("Ipsum " + i);
        }
        stmt.finalize();
        db.exec("SELECT rowid AS id, info FROM lorem", function () {
            db.close();
            go();
        });
    });
}

trace-addon-latency.d

#!/usr/sbin/dtrace -s

/*
 * trace-addon-latency.d: Run this script as:
 *
 *     dtrace -s trace-addon-latency.d -p PID
 *
 * to trace the latency of asynchronous operations kicked off by all add-ons in
 * the node process PID.  NOTE: this will instrument all native functions in all
 * add-ons, which can have a noticeable performance impact, depending on the
 * add-on.
 */

pid$target::uv_queue_work:entry
{
    callers[arg1] = ucaller;
    donefunc[arg1] = arg3;
    starttime[arg1] = timestamp;
}

pid$target:*.node::entry
/donefunc[arg0]/
{
    @[ufunc(callers[arg0])] = quantize(timestamp - starttime[arg0]);
    callers[arg0] = 0;
    starttime[arg0] = 0;
    donefunc[arg0] = 0;
}

trace-addon-latency.out

  node_sqlite3.node`node_sqlite3::Database::Work_BeginClose(node_sqlite3::Database::Baton*)
           value  ------------- Distribution ------------- count    
            2048 |                                         0        
            4096 |                                         167      
            8192 |@@@@@@@@@@@@@@@@@@@@@@@@@                8529     
           16384 |@@@@@@@                                  2326     
           32768 |@@@@@@@@                                 2740     
           65536 |                                         26       
          131072 |                                         0        

  node_sqlite3.node`node_sqlite3::Database::Work_BeginExec(node_sqlite3::Database::Baton*)
           value  ------------- Distribution ------------- count    
            2048 |                                         0        
            4096 |@@@@@@@@@@@@@@@@@                        6018     
            8192 |@@@@@                                    1590     
           16384 |@                                        302      
           32768 |@@@@@@@@@@@@@@@@@                        5722     
           65536 |                                         156      
          131072 |                                         0        

  node_sqlite3.node`node_sqlite3::Database::Work_BeginOpen(node_sqlite3::Database::Baton*)
           value  ------------- Distribution ------------- count    
            2048 |                                         0        
            4096 |@                                        286      
            8192 |@@@@@@@@@@@@@@@@@@                       6064     
           16384 |@@                                       642      
           32768 |@@@@@@@@@@@@@@@@@@@                      6701     
           65536 |                                         94       
          131072 |                                         1        
          262144 |                                         0        

  node_sqlite3.node`node_sqlite3::Statement::Work_BeginPrepare(node_sqlite3::Database::Baton*)
           value  ------------- Distribution ------------- count    
            2048 |                                         0        
            4096 |@@@@@                                    3604     
            8192 |@@@@@@@@@@@@                             8376     
           16384 |@@                                       1341     
           32768 |@@@@@@@@@@@@@@@@                         10831    
           65536 |@@@@@                                    3423     
          131072 |                                         2        
          262144 |                                         0        

  node_sqlite3.node`node_sqlite3::Statement::Work_BeginRun(node_sqlite3::Statement::Baton*)
           value  ------------- Distribution ------------- count    
            2048 |                                         0        
            4096 |@@@@@@@@@@@@@@@@@@@@@@@                  87884    
            8192 |@@@@@                                    19478    
           16384 |@@                                       8692     
           32768 |@@@@@@@@                                 32160    
           65536 |@                                        3455     
          131072 |                                         0      
:

Sign up now for Instant Cloud Access Get Started