Production Practices



Node.js comes with a concept of clustering built into the core. When using the cluster module in your application, your application is free to spin up as many workers as necessary to accommodate the load your application will receive. Though generally it is suggested to match the number of workers to the number of threads or logical cores your environment has available to it.

The logic that encompasses your master process can be as simple or as complex as you want it to be, at the very least it will need to be able to spin up your workers when needed. It is ideal to make sure your master is keeping track if workers exit and if it's necessary to create new workers.

The logic that makes up a worker is dependent on your application, but any socket opened in a worker will be shared among all workers in the pool. The design is such that you can write your logic once and spin up as many workers without having to worry about the synchronization of sockets on your own. If a worker close()'s a listening socket, that only indicates to the pool to stop delivering new requests to that worker, not that all workers will cease to receive new requests from that socket.


Domain is a useful module built into the core of Node that allows developers to gracefully handle unexpected errors for asynchronous events. That is to say, you can group related asynchronous IO operations together and if one of them errors in an unexpected way, that error won't necessarily disrupt other IO happening in the event queue.

One of the primary use cases for domains is the scenario where you hit an unexpected error when handling an http request, however there are still other users that have requests being processed by your application. If you are in this scenario and have deployed with cluster you should immediately close() the listener in the worker, report the error, and let the other requests finish before exiting gracefully.

Since the error you encountered was unexpected, it's not entirely clear just what might have happened to the rest of your state. You could use the error handler to roll back transactions, and try to repair as much as possible, but it's unlikely that you can guarantee that the process is in a state where it's truly safe to continue. Consider calling process.abort() instead to save a core file for later debugging, and allow the system to restart your process.

Deploying New Versions

When it comes to deploying new versions of your application, avoid an in-process mechanism for reloading your code. Such techniques are fraught with dangerous edge cases that can lead to memory and state corruption errors that are difficult to debug. Reproducible deployment is critical in production systems, and is best accomplished by starting with a clean slate each time.

If your application cannot sustain downtime for rolling out a new version of your software, use the cluster module to distribute the new versions to workers. One possible solution would be for the master process to subscribe to the SIGHUP event. When the master receives the event, it should fork new workers that have the new application logic, and inform the existing workers to close their listeners. When the existing workers close their listeners, they simply stop accepting new connections, but allow any current operations run to completion. Once the existing operations have completed the worker is free to exit cleanly.

Service Management (Docker)

The main process inside of a Docker container is responsible for behaving like an init process. To help with this and other responsibilities, Joyent has developed ContainerPilot. ContainerPilot uses a configuration file to indicate which processes it needs to start and at what frequency. This works well for Node.js applications, where there is a main process and other processes that are required to setup configuration and cleanup operations. Please refer to the nodejs-example project for an example of how to use ContainerPilot with a set of Node.js service containers.

Service Management (SmartOS)

Once your application is deployed on your server you will need a way to run it and to make sure it stays running. Enter the Service Management Facility: a framework for supervising your application. SMF is designed to ensure that your application is started only after other services that it depends on, and that if your application stops running it is immediately restarted.

SMF is feature-rich, including safeguards against your application failing too quickly due to configuration problems or other failures that require operator intervention to fix. Every SMF service has a log file that captures console.log and console.error output from your application, as well as the times of service failures and restarts. You can easily query the status of your service with a simple command-line tool, including a list of process IDs for your running node processes without needing to mess around with pid files.

Services are defined and configured with a straightforward XML format; an example for use with your Node application can be found in this repository. There is also a simple, JSON-based SMF configuration tool called smfgen which will work with many Node applications, without the need to write or edit any XML.

After you've modified the template to match your running user and the filesystem paths to point to where you've deployed your application, import the service into SMF: svccfg import nodeapp-manifest.xml. Once your manifest has been successfully imported, you can perform various actions:

  • Enable your application now, and at next reboot, with: svcadm enable nodeapp
  • Disable your application with: svcadm disable nodeapp
  • Query the current status of your application with: svcs nodeapp
  • Find the SMF log file for your service with: svcs -L nodeapp
  • List the process IDs of your running node processes: svcs -p nodeapp

Read More about Service Management

Dependency Management

It's important to consider your dependency tree when preparing to deploy your Node application. NPM is a fast-paced and growing community. Module authors are constantly improving their modules and publishing new versions, and their dependent modules are being updated just as quickly. This moving target can complicate your deployment if module versions and behaviors change subtlely between development and deployment, or even from one deploy to deploy in a scaled environment.

While it's possible for you to cache your node_modules directory for reuse during deployment, it may not always be ideal. You may develop in a different environment from your deployment. If you are in a heterogenous environment any modules that depend on a binary dependency will likely break since those modules are platform-specific. Also, depending on the size and kind of deployment it may not be feasible to store the various different combinations of modules required for your environments.

To handle these scenarios during deployment, use npm shrinkwrap. Shrinkwrap makes sure to walk the dependency tree and, at each level, record the module, its version, the module's dependencies and their versions. The results are then written to npm-shrinkwrap.json which can safely be checked into your source control. When you deploy your application with this file in the same directory as your package.json, npm will then go and fetch the exact version requirements that were available at the time you created the shrinkwrap file.

Read More about Dependency Management