Node.js on the Road: Q&A

Question and answer session for Node.js on the Road from Portland, 3/20/2014.

Ben Acker, Senior Software Engineer, Walmart
Nate Fitch, Software Engineer, Joyent
TJ Fontaine, Node.js Proect Lead, Joyent

[audience] What you have like one, two, three, and I was just curious if you had like a series of them?

Oh! yeah.
So those linked directly to a series of talks given by Eran Hammer and TJ at Joyent. So when we switched from—we did what I recommend everybody do when deploying something brand new. We changed on the exact same deploy, the machines we were deploying to, the OS that we were deploying on, and we bumped a major Node version, all while revving to hapi 1.0.

So all of those at the same time, change as much as you can, every single one—that's a lie. What ended up happening was we ended up having loads, there were like a few different memory leaks, most of them were actually with our stuff when we came in. We also had some weird trouble with streams, but there was one memory leak in particular that continued to plague us for a very, very, very long time, and watch those talks because there is some MDB wizardry in there performed by that dude.

I'll post the links to mine too, because the links are in there, and that's what those memory leaks were. And those were solved by the Node [xx].

That was a fix that went in right
before Black Friday. I was sweating bullets that I was not going to get that in there. I was really worried and then Eran's like, oh we were going to go anyway, I was like, oh my gosh.

Go ahead.

[audience] I'm curious, Ben, what kind
of infrastructure do you guys run on. Walmart's for proprietary infrastructure, or [xx]?

What is the infrastructure that Walmart is deploying Node on?

It started off, we
wanted to go SmartOS from the beginning, when it started off we were doing—the only thing that we could get at Walmart that was already approved was Redhat. So everything started off on RedHat, and all of our bash scripts were initially deploying there. And then eventually we were able to use a Joyent cloud. Walmart installed it internally, so we weren't actually able to provision machines ourselves, but our ops team would provision stuff for us. So running on SmartOS, currently Walmart mobile is all running on SmartOS VMs in an internal, so pseudo proprietary, but it's Joyent but in Walmart hardware.

Does that answer it?

Joyent has a public and private cloud offering. That's my Joyent blurb.

[inaudible audience]

What form of proxy are you doing with Node?
It's basically just a reverse HTTP proxy that—we went through a couple of the libraries, and then wrote our own. So it's passing everything through, and depending on what's going through taking some analytics data and logging stuff. Does that answer?

Well, you say exterior versus interior?

[inaudible audience]

[TJ] What's your front
door look like?

I don't know what I'm
allowed to talk about. One of the links on there, like Eran says, as much as we can, but I think like, he gives the full layout as far as we can go.


Another question?

[audience] I was just wondering
about Node 1.0. Next week? Two weeks?

The question about when Node 1.0 is coming?

[audience] Not really, what I was wondering
is, I've heard Isaac say some things before about how Node might eventually not incorporate anything in ES6, so say like module system, for example. Do you have any opinion on, I mean is Node going to stick require or…?

So the question is kind of what's going on with Node and the ES6/Harmony features in the future? Particularly with regards to the module system. Either of you guys want to take that question?

Okay, I'll take that question.
At the moment, Node is mostly following the lead of V8. When V8 enables a feature for Chrome, that's when Node adopts that feature as well, because we're pretty closely coupled at the moment with V8. So when V8 enables ES6 and other kinds of features by default, that's when it'll come into Node.

The key thing to keep in mind when they add these features like the module system, what Node's commitment is to the whole community, is that we're going to be backwards compatible. So even if we do adopt the new module system, all the code that you're writing today will continue to work with Node going forward. It will be some interesting times when we try and make the two things work together.

The good news is the community is already working around that and trying to solve those problems today, and make sure the specs match in a world where we can actually work in that way, so Dominic [xx] has been doing a lot of that hard work, so it's excellent to have him, but for right now, we're going to follow V8's lead of when that feature gets enabled and we'll extend Node when it makes sense.

Any other questions? So we know, how does Joyent do deployments? We had a question about the front door for Walmart that Ben pointed to things, how does Joyent handle the front door? What kind of deployment mechanism for Node, you talked about it briefly, but can you in a little bit more in-depth.

What's the layout, are you using the cluster module? Are you.

Oh! yeah. Sure. So what
Joyent does, st least, for Manta, we usually run an HAproxy in front of many Node processes, and then each of those go off box, so each each of those services that you saw up there is in it's own zone, and then [xx]. And in terms of how we deploy?

So TJ mentioned Joyent's
private offering for public and private clouds, so on Manta one of our deployments is actually deployed on top of what Joyent calls our SmartDataCenter, so it's kind of our way of deploying [xx] servers.

So it's kind of dockerish
insofar as it's like images, and when they do a deploy, they make an image, and then they go provision that instance, and when they want to rollback, they can just go back on an instance. You guys, how does Walmart, when you have a failed build, what's your deployment process?

What kind of artifacts are you keeping for your builds, and how do you manage that?

When there's a failed build?

If you
have regression in your…

Oh! okay, we
just find something and follow the exact same thing to push it right back out. So Jenkins which is your favorite, I know, Jenkins basically is our cron job for all of our bash scripts, and that's it. Part of doing the NPM install, it does some stuff to get some sha sums on it, there's a unique identifier for each thing, and it manages, like when I, that tarball, it maintains that tarball after doing the NPM install prod, and then the shrink wrap, and then it tarballs that, and then we maintain that.

So, either we'll, usually we'll just push a fix out, I can't remember a time when we've rolled back to a previous one. I mean it probably has happened but, I'd miss a lot of sleep. So, I can't remember what [xx]

Go ahead.

[audience] So, I have a question,
sort of [xx]. So recently there was a sort of summit on error handling in Node. I was wondering if you could speak to like what's the plan going forward for things like domains or [xx]. What is the future in summary for the [xx]?

OK, so the question is, there was recently a summit in the Node community of sorts around error handling, and what was the result of that summit. There's going to be a blog posted here pretty soon, and so is going to have some documentation about how Joyent personally handles error handling as part of that conversation. That's going to be submitted back to the community to kind of refine and form some documentation about what the community considers best practice as compared to what Joyent is doing in production today.

The conversation about that is whether or not you should be using domains, if you're using domains today, I would ask you to not, but it's a difficult conversation. So we kind of break it down and give you a TL;DR summary of this, is that there are operational errors and programmer errors. An operational error is something that happens in the normal course of the operation of interacting with the system or a remote service, and you either anticipate that or you don't anticipate it.

If you didn't anticipate that, that escalates to a programmer error because you didn't handle that error case. Another kind of programmer error is like the normal syntax or reference error that you hit at run time, and we generally, because of our postmortem debugging tooling that we've written prefer, sorry, yes, sorry, it's my skill, it's tooling, but we prefer to crash immediately at that point in time because you never understand the blast radius of when that programming error actually happened, and you don't know the state of the rest of your application, and the interconnected services, and Nate can probably talk a little bit more about this, but in one case we had an unhandled error that we didn't realize that happened, which resulted in a transaction in Postgres being held open, and it just stayed there for three weeks, we didn't know about it until finally queries just started slowing down over time.

Then we found out that three weeks ago we had an error that resulted in that, and even when you're trying to just die whenever an error happens, you don't necessarily understand the blast radius, so it can be localized to the single Node process, or it can be into the rest of your environment. So domains have use cases in very, very narrow pieces where you can guarantee the layout of your environment, my estimation, this is your Node Core project lead saying that this module is kind of unsafe at any speed, because we can't actually guarantee that when you handle an error in the domain error handler, that you're actually cleaning up all those resources that are involved in that. So that's where the community is going to get to get formalized more around that error handling, but Joyent is going to post some information about how we handle errors.

I don't know if Walmart wants to talk about how they handle errors?

Um…we use domains.

So I kind of set that up for that.

Yeah, any other questions?

[audience] Why shouldn't we use domains?

Why shouldn't we use domains?

So basically kind of
the promise of domains, was that you're going to be able to set up a single error handler and you're going to go over here and collect all your resources and dispose of them at that time.

That sounds like a great promise. In practice, you have to make sure that when you put that error handler in there that, you have the visibility into all of the places that you're holding on to those resources. JavaScript's a very powerful language in so far as it lets you shoot yourself in the foot all the time and you may or may not have access to all the places where you're holding those resources, and you may be holding them in a scope closure somewhere else that you didn't realize, so being able to clean them all up at run time may or may not be something that we can guarantee, you have to be, it's really scary. You don't always understand the blast radius of that, of the error, and you may or may not be able to have access to it.

If you can discretely define these operations and put those those error handlers more often in place, then you will be able to see where all those things are happening, but the promise of domains was like, we're going to attach all of these different event emitters and asynchronous operations into this world, and then magically clean it all up for you, but it actually requires more domain expertise for your application to understand discreetly where those things need to happen, and we just can't do it.

There's not a technical solution to a programming problem in that regard, because if you could have fixed this error with more code, you probably would have used that code in the first place instead of writing a code that had a problem. So I don't know, that's why you don't use domains.

You use
domains for error handling for the logging to get more information so that when you—so my question back for you there is, is that just so you can do more asynchronous operations before the process dies? Do you still die after you get that unhandled error? [yes] OK that's about as safe as this is going to ever be. Go ahead and log and send something back out, but then die as fast as possible, the reason I don't prefer that, is because I've lost all the state at the time the error actually happened, so with MDB, I like to have my callstack and have all the arguments that caused the error at that point in time. By the time you're back in the error handling, that context is gone, so you don't have the stack at that point in time when the error happened, you have the error object that has a stack on it, but it's not exactly the same kind of debugability that you have at that point in time. But that is a perfectly valid use case, because not everybody has access to the tooling that we get to use.

No, you do actually, you do have access to it.

Well, you could be
on Windows, you could be on FreeBSD. Does anybody deploy in production on OS X? This is something I ask randomly. I used to ask if anybody, you, what?

[audience] For some values of production. I'm not talking like a website scale thing or anything like that, but I do run a lot of services in my house that, hardware and all sorts of other things that do aggregate data and have to run on my laptop.

Okay, I will qualify
it better, is anyone running a start up or otherwise company website on Darwin?

Okay, I used to ask that question about who deploys in production on Windows, and then I found out that NBCNews Digital—how many people know what The Today website is, the Today TV show? They have a website as well, they did—the Today website, the mobile website is now running Node, and they're deploying that on Windows, and I was excited and frightful all at the same time, but it's really exciting to know that people are out there, yeah, I'm sure it's great, yeah.

So anyway, yeah, so great, cool, any other questions? Anybody interested in how these guys are running Node in production or questions about how you can better run, or I don't know. What's the meaning of life?



We all knew that already.

OK, sorry, I mean,
I asked that to Ben earlier today and he had a different answer, so.

I wanted to see Lemurs
tap dance, that was my answer.

Alright, go for it.

[audience] Obviously, to go ahead and
just shut down your current existing systems and replace them with Node [xx], so how do you recommend current systems be slowly [xx] over to from like say PHP or other older [xx]?

How do you
move your existing infrastructure or existing application to Node.js from Java, PHP etc without disrupting your existing client services?

You've got it.

I'll pass it down.

You could put a proxy.

But, seriously, one of
the goals that we had was to throw the proxy over the top and then once it's—some of the analytics stuff that we would take was how long those services would take so we would basically have a backlog of which ones we wanted to replace first, and as long as it's replacing—as long as it's maintaining the same contract, you can replace all that stuff and still maintain your legacy system underneath that proxy, and then new development happens in Node, Huzzah! Old stuff is replaced as you get time and have need.

You want to talk about that?
Also they're not here to represent themselves, so I'll represent
Yahoo and PayPal. Yahoo, to
the Yahoo homepage, they have a bunch of different little widgets on there and some of those widgets are PHP and some of the widgets are Node.js, and more and more, those widgets become Node.js over time, and so that's how they handle it. Node is really great at doing little web servers, so every one of those little widgets is a different Node web server running back there, and so they can just replace them each at a different time, and they roll out in that way.

Paypal, they did, kind of, like a two team effort where they went and they had one team reimplementing their dashboard thing so it didn't look like it was in 1990s in Java. And then that was taking awhile, and Bill Scott came in and started to work on and another project where they took a smaller team and just started implementing it in Node.js, and they got the dashboard thing done in a fraction of the time, and then that's how they made that decision. So they picked one route, and they just made that one route Node. And then they're going over more of the routes and making those pages Node. So that's a pattern that we're seeing as well. Some people go ahead and do the night switch, just one night at whatever their low traffic time is, they just switch their whole site over. That's a scary, scary proposition, and I worry for those people and their blood pressure, but there are people that still do that as well.

Groupon is definitely in kind of the world of like we're going to switch from Rails to Node, and we're just going to do it. More power to those people. I love hearing about more people using Node. The other thing I like is, for Paypal, they're actually going through and retraining all their happy Java folk into the Node world, so they spend a day with Douglas Crawford, and then a day unlearning all of those things, so that happens.

Yeah. You don't have anything else?



Nate, do you have anything?



[audience] What's your favorite animal?

Oh, what's your favorite animal?
I'll start with you Nate.

Oh! creepy.
I'm going to creep everybody out, but they're actually bats.


Yeah, bats.

Any particular?


But they're the only flying mammal.



You're so cuddly yourself.
That's what it is.

I leave them on every whiteboard.
Any whiteboard that I can leave a cuttlefish on, I draw a cuttlefish on.

Oh! yeah.

I'm particularly fond of sloths myself.

There's a sloth sanctuary?

Here, and you can go there and feed them. And the website is Chasing-tail, and they're here in Oregon.


Yeah, I'm excited.

I wonder if I'm the type for that.
Yeah, it's one of those ones where you probably have to get pretty tight right. I remember trying to get to the, don't, there are about, we're not having that conversation.

Any other questions from you guys?

[audience] Current Mario Kart champion at Joyent, who's that?

it's still Dave Pacheco. So the current Mario Kart 64 champion. In case that's a reference to this thing called Kartlytics, which—Nate talk about Manta, and if you'll listen to the story you'd find out that Manta was actually written to show Dave's superiority in Mario Kart, so they upload a bunch of videos and analyze them and batch process and find out how awesome Dave is.

So if you check that website out, it's pretty sweet. You can upload your own videos and see how you rank and do those kinds of things.

[inaudible audience]

the question is, what kind of feedback, we're doing this road show, what kind of feedback are we looking for. I'm looking for, I want to find out how Node—me personally, in a selfish kind of way, I want to know how you're using Node, where it's working for you, in particular, what modules you're using and what you like, either in core or in the ecosystem, what modules in core aren't working for you?

I can't fix the ecosystem part of that, but I'm interested in finding out which modules should be potentially included as a first class feature in Node if they're getting a lot of adoption, and if there's a pattern that we're doing that could be easier for the community, and if there's something that isn't working for you like our HTTPS and TLS stuff has seen a lot of work recently, but it still has along way to go and our HTTP client has a long way to go.

So there are things that I already know about that need to be fixed, and I want to know how you guys are using it, or if there's some killer feature that's just totally missing from Node that you would love to see, and don't say promises.

[audience] Is there,
in the vein of HTTP/HTTPS, is there going to be a point at which I don't need to use Request to be able to request domains regardless of their HTTPS nature?

So the question is whether or not we'll have a protocol neutral Request kind of library. I'm not sure if we're going to be able to do that. There's that kind of question, and then there's also the question about retry, and doing more retries for looking up all the IP addresses and if one fails, failing over to that.

The question is how difficult is that to achieve from user space versus what core provides, and what we want to make sure, is that core's not making an opinionated decision for you, and providing you the API's that user space modules that can be really discreet, and easy to audit that you know that you can trust then you can install them.

Request has been filling a need for a long time, it's probably not going away anytime soon, but in general, it's a little bit heavier than I would particularly care for from a library, but we're looking to—where abstractions are hard to get right in user space, that's a candidate for core. So if there are things that aren't working in user space or difficult, we can adopt something that's working or if you know that there's something out there and you want to sponsor that part of it, bring it to us in a pull request, and we'll work with you on getting that in.

Go for it.

[audience] So what are some existing user space modules you're thinking about bringing into core?

So what are some existing user space modules that we're bringing into core?
We already included for .11, Contextify is now the backing store for the VM module. So if you're using jsdom in your deployment, it won't have to download and install contextify. If you're using any of the stuff that Nate talked about for Restify, Bunyan, etc, that rely on the DTrace provider, Dtrace provider is being pulled into core as well. And that's going to include just DTrace support and ETW. The easiest candidates for us to identify are popular binary add-ons. So people who aren't able to do the things that they want in JavaScript, we need to fill that need.

In a 1.0 world, it's possible that we would add something like FFI into core. So even if there's something you can't do in Core right now but you need an external library for but you really don't want to compile it, you could do something like FFI as a first class citizen. So those are the kinds of things we identify, but there are other things too from pure JavaScript that would make sense potentially as well.

Especially, if it's a hard pattern to get right. So there are common data structures that sometimes just need to be done, like we're not going to go the Java route, but there are some things that, like, Node itself needs some tree structures, and there's no reason why we couldn't include a working tree implementation in Node.

And everybody needs to do line-based processin, or a lot of people need to do line-based processing. There's no reason we couldn't have a Streams line-based thing that just does the right thing, and you don't have to figure out which module that is you want to use. So there's all kind of things like that, and I've written some of those that I would love to just kind of put it in there, so that's some of the stuff that's better candidates for moving forward with Node.


Sign up now for Instant Cloud Access Get Started