Node.js on the Road: Nate Fitch

Node.js on the Road is an event series aimed at sharing Node.js production user stories with the broader community. Watch for key learnings, benefits, and patterns around deploying Node.js.

Nate Fitch, Software Engineer

Hello, room full of people, my name is Nate Fitch. I am a developer for Joyent. I want to take a quick poll. How many people are running Node.js on Windows? It's OK, you don't have to be ashamed. OK, how many people are running on Linux? SmartOS? SmartOS? OK, so a couple in front, alright well, I hope that some of this will be, either convert you to SmartOS—oh and then any other BSD variant?

OK, alright. So I hope some of this will be applicable to you. So I work for Joyent, we're the corporate sponsors of Node. We also run SmartOS. It's a Solaris derivative, and a lot of the things—well some some of the things you may see are only available on SmartOS, so I wanted to take the poll. OK, so a little bit about me, I'm a software engineer.

I work on Joyent's Manta system. How many people have heard of Manta? Oh, man! You guys are killing me tonight.

[audience] Kartalytics!

Yes, Kartalytics. Also Minecrab.

So basically the whole talk is
about Manta, and how we run Node in Manta. I came to Joyent because I wanted to work on Manta, and so some of things that I did, I worked on the tools that run down in the Manta guts. I also did garbage collection audit, and most recently I've been working on software deployment.

So, let me tell you a little bit about what Manta is, and give you a high level architectural overview of what Manta is. How many people are familiar with S3? Oh, come on! OK, so Amazon Web Services or Azure's object store, so Manta is an object store with built-in compute. It's basically an object store that has this map reduce awesomeness built in.

And so this is kind of our high level architecture, and I hope it will become clear as I talk what I mean by built-in compute, but a couple of things that you should note. So it's probably your typical object store architecture. We have a front-end that when a request comes in, of course goes to front-end.

It's called front-end. We carry some auth information in our auth cache, we then check the index where the objects are located in our storage nodes, and then we actually go through the storage nodes. On those storage nodes, we have a controller that will take the objects in storage and mount them into compute, and so what we end up having is a object store with built-in compute.

Alright, so if you notice we use Node everywhere, and so I wanted to go through and highlight just a couple of ways that we use Node in Manta. So here are the places where we use Node as a server, or as proxy for data. So our front end, let me highlight, this Node process right here, that's our front-end, it's the one that composes everything from the authcache index and then streams all the data down to the storage node. It's a restify server. How many people are familiar with Restify?

Awesome. So that's one of the things that we use in production everywhere that we use a server, is we use restify. There's is LDAP.js here. Here we use, we actually have a different protocol for our index tier (we don't use HTTP). Node-fast if you want to go look that up, and then finally we also use Node down here as DNS, we wrote a custom DNS server over Zookeeper.

OK, so the next way we use Node is not as a server, but as a controller for other systems. So I want to highlight two. This job controller right here, this job controller—I keep saying there's built-in compute—so what it does, when you submit a job into the sytem, it will actually mount an object from the storage system into a zone. How many people are familiar with Zones, or Jaiels, or Linux LXC? So you have these little mini operating systems running on a bigger operating system, and we actually mount an object from the Storage Node into the compute Node.

And so, I wanted to highlight that because you can see we use Node up here as, I think what most everybody in this room would use Node for is for a server, but we're also using it down in the guts for doing low level operating system stuff. I don't know how many people have been bitten yet by how close Node is to the bare bones, but it's so close that I grind my nose on it at least monthly.

And then this one, this Node process is also a controller. What this one does is, it registers some—so we have a Node process living next to each of our Postgres databases, and that Node process coordinates with zookeeper for leader election. This Node process coordinates master failover for our DBs. So we get fast failover through a Node app who's

basically running a postgres process. Finally, the final way we use Node is I put this little ops thing up in the corner because it really only works through the front-end. We actually implemented garbage collection, and metering, and audit in terms of Manta itself, so our index tier periodically will upload indexes into Manta and then we have this ops box that puts Node apps all the way down, oh, it kicks off jobs that actually run on that index data, and figure out which objects we need to go garbage collect, or it processes our logs for figuring out how much we need to go bill you.

So we actually implemented some things in Manta in terms of Manta itself. And when I say in terms of Manta itself, I just want to highlight that the compute, what you actually run, or the way that you specify your jobs to run are in terms of Unix commands. So we're literally using sort, pipe, unique, grep.

Those are the commands that w're using down in compute. And that's how we implemented garbage collection. So we also use it as kind of our big data processing language as well. So those are the ways that we use Node in production. We also use Node for all our CLIs. So we have internal CLIs and then the external ones.

We also use it in the service deployments; we use Node as our framework for deploying services. And then of course one-off Manta jobs. Because I think that most people here are more interested in how we run Node on our servers, I wanted to focus on kind of four tools that we use in our production deployments of Node, and these are it: Restify, Bunyan, JSON, and then MDB.

Are these familiar to anybody? Kind of? OK. So I'm going to go through them very quickly and then hopefully the demo gods will be smiling on me today, and everything will go well with the demo. So Restify. Restify is a framework for writing RESTful web services. So a lot of people—pick your favorite Node framework, Express, whatever.

We use Restify primarily because it was written for focus on observability of the system, and full control over what goes on underneath, so full control over the HTTP stream. And, of course, because we're running in SmartOS, there's also DTrace. So when you build Node, you probably noticed like DTrace provider something, something, something going by on the stream?

Yeah. So on some systems, like Mac OS X, BSD derivatives, and then the Solaris derivatives, we have DTrace support. Restify builds in DTrace probes. How many people are familiar with DTrace? Any idea? OK, so DTrace is dynamic tracing, originally written I think for Solaris 10, what it is is, it's basically was written for absorbabilty into production systems all the way down in the OS, so you have these little probes that fire and these other things that will catch them and do what you want with them and so it's supposed to run—it is safe to run in production, that's what it was meant for, is to actually go in and observe the production systems because that's where all your problems happen.

And so here we have, in restify it actually fires this little probes whenever you get HTTP requests, so you can in real time, go and look at how many of your requests the server is processing without doing what you normally do. So there are links to it—that's not what I want to do.

The next one is Bunyan, how many people are familiar with Bunyan?
You got to love Bunyan, so everybody needs to go home tonight and look at Bunyan. So it's a one stop shop for log emitting and a tool for processing those Bunyan logs. So there are logs emitted in JSON, single line delimited JSON, and then the tools actually go and process them. So there's the logger that you include in your Node.js application, there's the command line tool that actually makes everything look pretty, and finally bunyan -p which does real time log level printing and filtering as your system's running production, and I'll show that a little later. Wow, I keep doing that.

<span style="background-color: rgb(204, 204, 204);">Alright, and then JSON.</span>
The only reason I included JSON in all of this is because I use it everyday. It's kind of my go to tool for anytime I have anything streaming JSON, I use this to process it. So all of our internal services we use Manta and for SmartDataCenter and Joyent, evrything is a RESTful web service that spits back JSON and this is what we use for processing everything. So, I gave a couple of examples, so if you had a field called remoteAddress, this would pull out that field and just print it out on console. The next one is filtering out where audit is true so, Bunyan—well specially Restify, when you include the audit logger, it always sets that _audit = true.

And so you can easily filter out all your requests from your Bunyan stream using that, because we're dumping logs—all sorts of different logs in there. So this makes it easy to filter out all the request you're actually looking for. And then finally that last one is just to show you can also add elements. So not only can you remove elements or add elements, -E is basically execute this JavaScript code on this piece of Jason.

And then finally MDB, how many people are familiar with MDB? Oh, wow! MDB is what gives us full observability into what screwed up in a Node app. So, we always run our production system with this flag, --abort-on-uncaught-exception. How many people have been plagued by uncaught exceptions? Oh just wait. You will. And so what this does is, when the Node app catches or has to handle an uncaught exception, normally it just crashes. This will also produce core dumps, so you can get it through that or you can get it by gcoring it. And then we have—and then the common commands, you can load up a core in MDB, you can see its stack where it actually was when it crashed, you can see the arguments—and I'll show this a little later—arguments to the function where it was when it crashed, print out any random JavaScript project within that core, and then actually finding them too, it's pretty nice. Alright, thoth. I'm going to take one minute to explain what thoth is.

So, I said before that we implemented some Manta things in terms of Manta. Thoth is our kind of debugging framework, so we have an agent, a little Node app running on all of our public cloud instances and all of our Manta instances that watches for core dumps, and when it finds them, it will automagically upload them to Manta, and index them, and then later we can—there's a command in Manta called mloggin. It will actually drop you into a shell on one of those compute slices and mount your object for you, so you get basically log in to an MDB shell through that, through the front door. So Thoth is our big core debugging, so basically we have Node apps they're just crashing all over the place—not really. Sometimes. It happens. Uncaught exceptions, it's my fault, not your's. So, thoth is our way of indexing those, figuring out what problems we have, and then going back and thus being able to fix them. OK, so I hope I have enough time to actually go through some demo stuff, so let's do that, and I have my lovely assistant Fred who's going to hold the microphone for me, he's not actually my assistant, but I do think he's lovely.

So what I wanted to do was give you kind of, can you see that? That's basically the whole thing. That's the whole thing. Alright, so with this app, it's a little toy Bunyan server. What I wanted to do was show you all those things I just mentioned: Bunyan, JSON, Restify, and a little bit of MDB. So let me walk through this app just a sec, can everybody read that?

The red's too dark? I'll read it character by character then. OK, first three lines are just variables for stats. I'm creating a bunyan log, the server actually gets created at the bottom and that respond in the middle is the actual response to the request. And then that right here, that server.get, right here, resource. That's restify's way of handling and routing requests, and then finally server on after, I'm actually creating the audit record that I was talking about before, and we listen, and that's it, any questions? GitHub? Oh yeah, I'll get to that. OK, so what I'm going to do is run that server.

Notice that I didn't print anything anywhere. So over here, I just wanted to tail those logs, and hopefully I still have those guys running. No, they crashed, poor guys. Let's try this again. Come on guys. You see, the demo gods are not smiling on me. We're screwed. Come on, come on, demo gods. No, no, it's me, hurray! Alright, so that's what a restify log looks like without JSON, so let me show you what that looks like, if we could get just a little bit…now this is what it actually looks like. So notice name, request parameters, etc, etc. So that's what he Bunyan log actually looks like. But of course, who wants to actually watch this all day long? If you're weird like me, maybe you do.

But most people will want to pipe that Bunyan and then you get pretty printing, see? So there it goes, so pretty nice, so now you see that we're mostly handling 200s and we're good. OK, so your service is running in production, great alright. So the next thing I wanted to show you was bunyan -p which is available for all UPST people, and for SmartOS people. So what Bunyan does is, it actually fires up DTrace. Sorry, my bad.

Say that I wanted to go and look at debug logs. Here we go. So you now see that we have debug logs, whereas before we just had info logs. Does everybody notice that? Can we do it again? So before we were just, because our log level was info right here, info, we weren't getting debug logs, but here, our logs are still being emitted in the same place in that other file, but here we're running DTrace which is in real time, putting in the debugging and grabbing those logs. So here we can do things, for example, we can just filter for errors. I'm typing at the end of my fingers. OK, now it's alright. So for example we can, I don't know if you…we want to error. Now we can see an error, and it only printed the error, cause we're looking at error logs. There're other fun things that you can do, you can also filter by, let's say we want to filter by remote IP address for example, we can do something like this dot, or we can only filter for params, params equals undefined, and there we go. We only have those debug logs, but not only the only debug logs, but it's only the params, see, so we can in real time filter out things like that or maybe here's another cool one. This dot remote, address is localhost. I'm only running this on my Mac in my dev view. How many people in production have wanted to go and find all the requests that are coming from one particular IP address?

Wow, come on! Tough crowd, I know you've wanted to. I know you've wanted to. OK, basically that's what Bunyan does for you. JSON will do the, the JSON tool will do the same sort of filtering so I can cap the logs and pipe it through JSON and still do basically the same filtering as that. OK, alright, I really want to do the MDB thing even though TJ told me not to, so I'm going to do it.

Alright, so you notice here let's see, node demo. I don't have a core file there yet, I was just trying to keep myself honest that I would remove my core file. So here we're going to crash the server and crash. It's got to be dead. So now we have a core. So I'm going to MDB that core file. So what this is doing is it's loading the shared object.

So my dev VM is very, very old, so that's why I'm loading like a separate one, but normally just do like load V8, like this, and everything's happy, but my dev is really old. So here, now we can take a ::jsstack and we can see where we were when we died. Thank you TJ, there's a bunch of restify stuff in here, but the interesting thing is, note right here, we were in next and respond, you notice where we crashed in this application, it's right here, we are in the respond function, now what's cool is, if I give a -v, it actually gives me the code, gives me the code, of where it was, when it crashed. That's pretty cool. So another thing that's kind of cool is notice that right here I have this arg1, arg2, arg3, guess what? We can print those out, so here we go. We're going to jsprint that guy and there we go, there's our request right there in JSON, yeah.

Some of the other cool things that we can do ::findjsobjects. We can look for all objects that have a success property, so remember that success object that I had, I wanted to know how many successes we had before this happened. So this is a representative object which means you can go read the blog. So it does pipelining like normal pipelining, so we want to print all the objects. The thing about the core file is it can also find garbage objects, so that's what that first object is, but that second object, there you go. There's total and successes. So I actually introspected the core and found this stats object. So these are the—so Bunyan, Restify, MDB, JSON, those are kind of our go to, and Thoth, remember how I said Thoth is our thing that uploads into Manta. Thoth will drop you into a compute zone already on the MDB command line with whatever. OK, so
that's my demo, and I'm glad it worked, mostly. Thanks Fred.

I've convinced some of you that SmartOS is awesome. Alright, so I set up this little compatibility matrix to let you know you can use Restify, Bunyan, and JSON on Windows, you just won't get the Bunyan -p, like the real time stuff because that uses DTrace down below. Linux, you actually do get MDB support

on LInux. TJ actually wrote a bunch of stuff that will let you take a core file from a Linux box, and then if you have a dev smartOS box, you can it upload to Manta and m-log into it, but you can actually introspect a core taken from a Linux—like a Node process running on Linux, and introspect it on SmartOS.

So you can do the same thing, which is awesome. BSD and Mac OS X, you get all that DTrace goodness of Bunyan -P and MDB, and then SmartOS. So of course we have all that in SmartOS, and that's what I was running on right now, was SmartOS. OK, so that's my spiel for how we deal with Node problems in production, and there are times when we find bugs.

So I wanted to cover just a little bit about how we engage with Node Core team. We use GitHub issues just like everybody else. I do have a direct line to TJ, but I try not to use that very much, and I usually don't use it. So, use GitHub issues, provide a very clear description, and please include a snippet of Node that reproduces the problem.

So when I first joined Joyent, this was my first one and Isaac, I gave him an example using curl and he got really mad at me, so he redid them in node. And then the second one's by an engineer named Dave, and it's probably the best example I've ever seen of a tutorial. So here's a bunch of helpful blogs and tutorials, and if anybody's taking notes right now, you don't need to.

Everything that I just did is in that GitHub repo, nfitch/node-demo, I have my presentation, so all of these slides are there in a big markdown file, and then I also have all those commands that I ran, that I have on my cheat sheet. That's also right there as well. So if you want to go back and look and play around with it, you can. Manta's cheap, you can do it in Manta, and other than that, that's me.

Thank you.

Sign up now for Instant Cloud Access Get Started