Node.js comes with a concept of clustering built into the core. When using the cluster module in your application, your application is free to spin up as many workers as necessary to accommodate the load your application will receive. Though generally it is suggested to match the number of workers to the number of threads or logical cores your environment has available to it.
The logic that encompasses your
master process can be as simple or as complex
as you want it to be, at the very least it will need to be able to spin up your
workers when needed. It is ideal to make sure your
master is keeping track if
workers exit and if it's necessary to create new workers.
The logic that makes up a
worker is dependent on your application, but any
socket opened in a
worker will be shared among all workers in the pool. The
design is such that you can write your logic once and spin up as many workers
without having to worry about the synchronization of sockets on your own. If a
close()'s a listening socket, that only indicates to the pool to stop
delivering new requests to that worker, not that all workers will cease to
receive new requests from that socket.
Domain is a useful module built into the core of Node that allows developers to gracefully handle unexpected errors for asynchronous events. That is to say, you can group related asynchronous IO operations together and if one of them errors in an unexpected way, that error won't necessarily disrupt other IO happening in the event queue.
One of the primary use cases for domains is the scenario where you hit an
unexpected error when handling an http request, however there are still other
users that have requests being processed by your application. If you are in
this scenario and have deployed with cluster you should immediately
close() the listener in the worker, report the error, and let the
other requests finish before exiting gracefully.
Since the error you encountered was unexpected, it's not entirely clear just
what might have happened to the rest of your state. You could use the error
handler to roll back transactions, and try to repair as much as possible, but
it's unlikely that you can guarantee that the process is in a state where it's
truly safe to continue. Consider calling
process.abort() instead to save a
core file for later debugging, and allow the system to restart your process.
When it comes to deploying new versions of your application, avoid an in-process mechanism for reloading your code. Such techniques are fraught with dangerous edge cases that can lead to memory and state corruption errors that are difficult to debug. Reproducible deployment is critical in production systems, and is best accomplished by starting with a clean slate each time.
If your application cannot sustain downtime for rolling out a new version of
your software, use the cluster module to distribute the new
versions to workers. One possible solution would be for the
master process to
subscribe to the
SIGHUP event. When
master receives the event, it should fork new workers that have the new
application logic, and inform the existing workers to close their listeners.
When the existing workers close their listeners, they simply stop accepting new
connections, but allow any current operations run to completion. Once the
existing operations have completed the worker is free to exit cleanly.
The main process inside of a Docker container is responsible for behaving like an init process. To help with this and other responsibilities, Joyent has developed ContainerPilot. ContainerPilot uses a configuration file to indicate which processes it needs to start and at what frequency. This works well for Node.js applications, where there is a main process and other processes that are required to setup configuration and cleanup operations. Please refer to the nodejs-example project for an example of how to use ContainerPilot with a set of Node.js service containers.
Once your application is deployed on your server you will need a way to run it and to make sure it stays running. Enter the Service Management Facility: a framework for supervising your application. SMF is designed to ensure that your application is started only after other services that it depends on, and that if your application stops running it is immediately restarted.
SMF is feature-rich, including safeguards against your application failing too
quickly due to configuration problems or other failures that require operator
intervention to fix. Every SMF service has a log file that captures
console.error output from your application, as well as the
times of service failures and restarts. You can easily query the status of
your service with a simple command-line tool,
including a list of process IDs for your running node processes without needing
to mess around with pid files.
Services are defined and configured with a straightforward XML format; an example for use with your Node application can be found in this repository. There is also a simple, JSON-based SMF configuration tool called smfgen which will work with many Node applications, without the need to write or edit any XML.
After you've modified the template to match your running user and the
filesystem paths to point to where you've deployed your application, import the
service into SMF:
svccfg import nodeapp-manifest.xml. Once your manifest has
been successfully imported, you can perform various actions:
svcadm enable nodeapp
svcadm disable nodeapp
svcs -L nodeapp
svcs -p nodeapp
It's important to consider your dependency tree when preparing to deploy your Node application. NPM is a fast-paced and growing community. Module authors are constantly improving their modules and publishing new versions, and their dependent modules are being updated just as quickly. This moving target can complicate your deployment if module versions and behaviors change subtlely between development and deployment, or even from one deploy to deploy in a scaled environment.
While it's possible for you to cache your
node_modules directory for reuse
during deployment, it may not always be ideal. You may develop in a different
environment from your deployment. If you are in a heterogenous environment any
modules that depend on a binary dependency will likely break since those
modules are platform-specific. Also, depending on the size and kind of
deployment it may not be feasible to store the various different combinations
of modules required for your environments.
To handle these scenarios during deployment, use npm
shrinkwrap. Shrinkwrap makes sure to
walk the dependency tree and, at each level, record the module, its version,
the module's dependencies and their versions. The results are then written to
npm-shrinkwrap.json which can safely be checked into your source control.
When you deploy your application with this file in the same directory as your
package.json, npm will then go and fetch the exact version requirements that
were available at the time you created the shrinkwrap file.