Node.js on the Road: Q&A

Node.js on the Road is an event series aimed at sharing Node.js production user stories with the broader community. Watch for key learnings, benefits, and patterns around deploying Node.js.

SPEAKERS:
TJ Fontaine Node.js Project Lead, Joyent
Scott Rahner IT Manager, Dow Jones
Craig Spaeth Web Engineering, Artsy
Brennan Moore Web Engineering, Artsy
Roberto Masiero SVP, ADP Innovation Labs

[audience] I'm just curious how you guys have dealt with keeping the file output size really small, that's one of the problems that we're currently facing when we use it.

So yeah, you're talking
about just like JS asset bundle being small? Yeah. Yeah, so like I was showing with the organization of apps and components, we actually have asset packages that are specific to each page, so we've actually, for the most part, every page on Artsy has it's own app, and therefore we have an asset package that literally goes into the app and goes into the individual components and just builds those specific ones necessary to render the page, and so we can really tailor our packages down that very specific page itself, which keeps them pretty small. And on top of that, we also group any common dependencies amongst all the apps into their own assets package so there's two different asset packages.

One that's like common dependencies like jQuery and whatnot, but we point to the CDN jQuery but common things like backbone and what not that are used across apps, and the second package is like only the JavaScripts and CSS you use for that specific app. But yeah, we are looking into how we can use browserify to minify the redundancy across the two packages and it's something we haven't yet really like dived into too far, but that's mostly how we do it.

[audience] Brennan, you mentioned the memory leaks, and I had a question, maybe for you or TJ. Why is—I have this problem with PM2. I see the memory going up through the roof, and then at some point leveling off. You guys know anything about this?

Yeah, for us we have limited memory usage on our heroku dynos, so it peaks out like around a GB, because we're using the larger dynos and then it restarts and then it goes back up.

So as far as seeing it reach a certain point, 1 GB is kind of actually small in general for what you're doing, but for that, so V8, by default, on a 32-bit process has 1 GB max residence size for the heap space, on a 64-bit its 1.5 GB, so if you're not changing Node in any way, the most you're going to get out of it, is 1.5 GB of memory usage.

When you hit the 50% mark of capacity at that point in time, you actually tell V8 to start running the GC more. So if you were seeing that magic number right around 750, that's probably why you see it reach up to that part, and then you're just doing that. That's usually indicative as just creating a lot of garbage objects, so they're temporary objects that aren't being used for long periods of time, so when the GC needs to run, it can just keep itself pretty stable at that point in time.

If you're interested, we've done a lot of work on this, I personally have chased down some really nasty memory leaks. I did a blog about four bytes that wasted my life for three weeks. But if you check out joyent.com/developers/node, there's a bunch of information about debugging Node specifically on the memory leak stuff that we've done, so if you're interested (you guys as well, I'd love to talk about some of that stuff), but there's tooling out there to investigate what's actually going on and you can figure out where your memory leak problems are.

[audience] Okay. Okay. I have a question and an observation. The demonstration use, this commonality of that is they are related to, all of them related to the frontend and web to some extent. User-facing clients. But I can see from the roadmap, you know we have this very fantastic data components like Streams.

I can see that in a good future specifically for Node in the big data analytic, totally at the data center level for example the [xx].js, you know Joyent Manta kind of thing. I just want to see how you're going to promote that because the convention in Node is JavaScript which is something to do with HTML but no that's not the case, right?

I would say from purely Node point of view, it's all about Non-blocking I/O and Streams, so I want to see how you guys want to promote that direction.

The streams API is
pretty powerful and is used all throughout for a lot of people, especially in the ETL path of like, I'm just doing a lot of extract, transform, and load where you see all the time big data. I've personally done this a ton of different times, and as an aside Joyent has a big data product all written in terms of Node underneath the hood. If you want to check that out, it's called Manta, but Node and big data are a future that's going to be coming. It's here now in terms of Manta and other projects that people are doing, certainly the semantic search stuff that you guys are working on is also part of that conversation, so there are people doing big data in Node today. We can do better to get that story out there. Some things I want to keep in mind for people, is that JavaScript as a language is not always the right tool for the job.

So if you're doing a lot of high precision, computational, numerical transaction things, like probably you don't want to be using JavaScript anyway.

But moving
data across, shunting data from one service to another service, performing analysis on that, using Node as the glue for the rest of your infrastructure—Node is a perfect fit for that mechanism, and yeah, so big data and Node are in love.

[audience] Hi, Scott, you talked about this a little bit, but how do you manage packages that are internal packages that essentially are things like wrappers around, like MYSQL connections, and stuff that are very specific, but you want to use in multiple different Node projects. And it's really for anybody, but I know you did mention it a little bit.

Yeah, yeah absolutely. We have this NPM registry internally, and basically, even in development, when you publish something, it goes there, it doesn't go to public, right? It goes there. And then that registry also has, or at least has access to everything that's in public, so you can do the straight up require package.json stuff and say, hey! I just want this module that I just created, and you can do that in any of your projects, or this dependency injection thing that we have which internally the framework is doing the require and caching that, actually it's instantiating the object and creating an instance and holding that in cache for that request, and I believe, even across requests that require itself is cached, so those are the two modes which we share. It's all in NPM right now, we are doing some GitHub stuff where you can pull things directly from GitHub but it's mostly NPM, and it's either through the traditional route or this dependency injection.

[audience] For the dependency injection, do you support like, multiple different versions of this, like so that if somebody is running one part, if we have to change, if you had to change for a newer thing so that you could pull different ones?

Right. So within one app
you can only be running one version at a time. So if it's in what we call an app—so there's a way to do this where you create an application and all its configuration is for that application—you'll serve one version of that thing, but you could easily have two applications sitting on a server or across servers, and yes they could use different versions from NPM at the same time.

[audience] Thank you.

[audience] Yeah, I just wanted to
say first, awesome job with the dynamic tracing stuff going into .12, super cool. I don't know, maybe Bryan wants to jump in to help answer this, but I was just wondering if there's a story for, I guess, some other level of features with DTrace on Linux as well as SmartOS in the future. Because I mean to think it's extremely awesome, amazing tool and a lot of people don't get a chance to use it just because they're afraid to try something other than Linux.

[Bryan Cantrill] Did TJ pay you to ask this question? If he's bribing you in any way.

Give me the money later.


[Bryan Cantrill] TJ, do you want to?
[TJ] No, by all means, inventor of DTrace, please have a go.

[Bryan Cantrill] So DTrace is open source.
It's been open source for almost a decade. I kind of assumed to be ported to Linux by now. There are two kind of sporadic ports. One, that's somewhat incomplete, one that's much more complete, but has been done by Oracle, so they're really not focused on anything other than Oracle enterprise Linux. We still stand at the ready to help anyone that wants to do it. It's ported to FreeBSD, it's on your Mac, and actually the Mac remains the most complete port.

[Bryan] I think one of the challenges that we've got, is that you really need a complete port of DTrace in order to get—because the user level instrumentation, and in particular the ustack helper that allows you to associate an in-kernel event with an entire JavaScript stack backtrace. Which if you haven't seen you should see, I'd be happy to demo it for you at some point. The number of kittens that needs to be slain in order to bring that to you is unspeakable. I mean you have never seen a sausage factory like this one.

[Bryan] But the sausage is delicious. It may be kitten sausage, but it's delicious. Getting that sausage factory on to other operating systems is a really, really heavy lift. And we've seen, the Mac remains the only real complete port. FreeBSD kind of behind that, and some of the other guys behind that, so I don't know when you're going to see it on Linux.

[Bryan] Now the thing I would say though, and you were talking a little bit about memory leaks, and TJ was mentioning some of the tooling that we've got around that, one of the things that we have done, and I think it's been actually due to an observation from TJ, which has been actually helpful, is to take the post mortem analysis that we have for SmartOS, which is extremely rich.

[Bryan] So we were coming to this, I was coming at this from a kernel developer's perspective, where when the kernel dies, we take a crash dump, and we debug it and we fix the problem. We wanted to do the same thing with Node, and Dave Pacheco at Joyent solved the problem that I actually thought was impossible to solve, but I didn't tell him that.

[Bryan] And allowed us to take a crash dump, a core dump from Node and get to JavaScript level state. So, from a core dump you can see your JavaScript level objects. You can't do this in any other dynamic language. Don't let the Java guys push you around on the playground, because Java can't do this and JavaScript can, so take that Java and Erlang bigots. The one downside of this technology was that it was very much SmartOS specific, and TJ had this great observation that hey—and it's great, if you're running on SmartOS and God bless you, you should, run at Joyent, you can run it in your own private cloud on SmartOS, but if you're not, you can take a Linux core dump, and TJ wrote a little shim that allows us to interpret a Linux core dump on SmartOS. You can upload it to Manta, and if you've got a Node support contract from Joyent, and we're supporting your Node in production on Linux, that's how we're doing it. So if you see a memory leak, and actually with the guys from Artsy, I want your core dump man. We can upload that and we can actually go through and find the JavaScript objects that are responsible for that leak, even though you're on a totally foreign system.

[Bryan] So, we're very mindful of the problem, and we have created very rich tooling on SmartOS with respect to Node, and we know that that creates envy for those of you who are on other systems, and we're trying to find bridges, so sorry TJ.

[TJ] I'll put just a small quick
button on that that Node has static probes that are built into the binary that work today with tools like systemtap and ktap, they don't have the facility to do what the rest of DTrace can, and ETW for that matter, can do. So you can get a certain amount of basic information out of Node when we compile the binary but that's all the further you can get. Cool.

We have some questions up here.

[audience] For those of us that write binary modules.

[TJ] Oh! God.

[audience] Yes, this question was
going to happen. If we want to remain compatible from .8, .10, when .12 hits, whenever that happens, are we using nan right now, do you recommend not using nan? I know you talked about the C interface layer, are we going to have to rewrite twice? What's the sadness gonna look like?

So, first off, nan is a perfectly acceptable
short term solution to this problem. I've worked with Rod Vagg who started that, and it's been taking up by the community. The purpose of nan, is just a C++ header that fills this gap for some things along with a few patterns that other people have found useful and helpful for them as they've been developing things.

Their intention is not to provide backwards compatibility at any point in time. If you talked to Rod about it, he's happy to break compatibility going forward. The specific part of the Node API, the add-on layer is to solve that problem once, at least from as much as we can solve it for 90% of the used cases of binary add-ons.

The module is going to live in NPM and it's going to work on 0.8, 0.10, and 0.12. And then, once 0.12 branches, it's going to be included into core of Node, and it's going to work from beyond there. So, if you write your module and start using it today against the C API, you'll be able to compile—start shipping your modules based on that interface, and when it becomes a first class feature after 0.12, you don't have to worry about it and the problem's gone.

[audience] When's that module arrive?

That module is on GitHub
right now under my part, it will end up back under Joyent's umbrella eventually but its tjfontaine/node-addon-layer, and if you do -test, you'll see a little example module that I did around it, it will be published into NPM as, after I do 0.12, then I can finish off the rest of the add-on layer and then do that.

This is a dependency chain thing thing.

[audience] I saw you guys
recently had a port by IBM to the POWER chip, right? So are you guys a PartnerWorld with IBM at this point?

Okay, so IBM on their own terms decided that they wanted to make Node work on the POWER chip, which is great, I love to see that kind of activity from the community, but the secret sauce there mostly is around getting V8 to work on POWER, and not around Node. As it happened, we had other people doing AI exports of libuv and all the other parts of that around there, so for Node to support POWER going forward as an official port kind of mechanism, IBM and Google will have to get together and make POWER an official backend for V8.

And once it's there, it will all just fall out of Node and we're happy to support that. It's just the same way that Node works on ARM because Google has interests in developing it for ARM.

[audience] This was leading to another question,
which was about security. Have you guys run any security tools against your source base to see how much vulnerabilities you have? Clearly, I'm very happy he's sharing my social security number with the people taking over Crimea…

Right. So Node itself having a very thin surface area or very small surface area makes it lucky for me as someone working with an open source project to not have to worry about too many security issues. Node has had some, and we'll see others in the future. People do run static analysis passes over the top of Node and we've responded to those requests. As far as I know and as far as been communicated to me around that part, there's been no active attacks against Node in that respect.

So we work really closely with people who are identifying security problems, and we work hard to make sure that that's all working for you. The bigger and harder question is not really about Node itself but it's abut the module ecosystem. And what actually is there, Adam Baldwin from the Node security project actually has a mechanism in place to audit, and he has a team of people and works with community to audit packages, and when people find security issues in packages, they do responsible disclosure to those module authors, they also offer up a rest API for you to validate your module and version to find out if there have been any recent vulnerabilities announced about that.

So Node itself, very concerned with security and we work really closely about people who are also concerned about that, but the concern in the wider conversation needs to be about the ecosystem and responsibility of module authors who are making popular modules in responding to vulnerabilities that are found.

[audience] I know that Homeland Security actually subsidizes the security of open source software. Have you had any contact with them about giving you [xx]?

I personally haven't had any
communication with the NSA and Homeland Security, or if I did, I'm sure I wouldn't be allowed to tell you. If they wanted, if Homeland, anybody wants to sponsor security work around Node, I welcome it, and if Homeland Security wants to participate and fund people to work on it, more power to them, and I want to work with everybody, so.

[audience] Thanks.
[Bryan] Yes, and I would also
just point out that, again, it bears re-emphasis that Fedor, who is now the world's most famous security nerd, thanks to CloudFlare is on Node Core. So we've got people who are very, very, very security conscious on Node Core.

[audience] Hi.
I'm new to Node.js. All I did was Express, but my friends know that I used a lot of .NET frameworks and everything like that, and they just told me about Node.js tools for Visual Studio.

And I know you just kicked 125 guys out of the .NET framework and told them to use Node.

I think that's the way to put it,
I like that one. I'm going to use that one.

Would you consider telling your
guys that use .NET, hey! there's the tools for Node.js with .NET framework so you could?

Yeah, one of the things that I meant to touch on in the presentation and I didn't, is that at Dow Jones, we let people use whatever they want. So if you want to use vi or Sublime or the Visual Studio tools or whatever, whatever makes you a faster developer is going to make you a faster developer. And I can't speak to the Visual Studio tools because I was the Windows guy, but doing Java and now I'm pretty much a Mac and Node guy, so I don't know, I've heard tepid things about it.

Nobody was amazed with it, a lot of people even within that .NET group, people were using Sublime and all different manner of tools. So I don't think there was, I don't think any one editor in that realm won for those .NET developers, but…

[audience] So question
about our community. I think Node.js, I haven't seen anything really grow like this from a community perspective or a platform perspective since really Java, and it's really amazing, the community and the people that are adopting it. What happens with the, what is Joyent doing, and you're doing to grow the community, as well as protect it from the kind of bloat that Java end up having in it's JCP process, and all that kind of stuff?

So the first thing that
we do for growing the community is stuff like this. This is a perfect example of how Joyent goes out there and wants to make sure that it's clear that it's not just Joyent that's running—Joyent doesn't run Node.

Node's an open source community, they pay me to be the project lead, but we do operate on our own terms as a project inside of Joyent with a lot of help and support from really talented engineers that exist at Joyent. I get to talk to really smart people about how we developed those APIs and how we debug and fix problems.

So it's a real benefit for for Node to live inside of Joyent from that respect, but because we're not in a foundation mechanism, means that I, as a project lead, have the ability to say yes or no to a feature without worry about repercussion from somebody wanting to walk off the board because they didn't like my decision, because that's the way it works, and it actually makes it for a project like Node because it has a small surface area, it actually limits the abuse of any kind of process that exists around there.

That's not to say that I don't work with companies to find out what features they need. That's also what this road show is about, to make sure that we know what modules are going to be out there, so talking with MasterCard and Paypal about how they want to see the TLS and Crypto APIs improved is a big part of this as well.

So we're working to prevent bloat by just making sure that we're talking to people who are using Node to figure out where it is that Node should be exposing the things that we currently don't expose that clearly we should. It's really easy to identify those things, because if Node's not good at it, we won't do it. But if Node needs something like that to implement some other feature, let's say hypothetically we had to implement HTTP/2 or Speedy in terms of Node. Any kind of facility that was needed to make that useful for Node itself to implement that couldn't be done in a user space module, Node would probably nominate that to be something that was included back into core.

We also look at things like binary add-ons, a popular binary add-on is a good indicator of something that would end up getting pulled back into core. In 0.12, the VM layer was changed underneath the hood. VM for Node is now implemented in terms of contextify, so if you're using things like jsdom, you can actually—the dependency for contextify isn't there any more, you can just use Node 0.12 and that's one less binary module that you have to do.

So it's a give and take of identifying what's popular in the ecosystem that people are having problems installing that Node can solve that problem for you.

[audience] Hello, so this one
goes out to Scott. You started comparing the dependency injection framework that you you had inside Tesla to continuation local storage. Can you elaborate on that please?

I haven't used CLS that much, but I know that they're—like on the most recent Nodeup they we're talking about being able to pass things around, or like in the context of domains to like destroy a bunch of stuff when you're done with that thing, and it's just easier to control. It goes towards solving that problem because at the framework level inside of Tesla, we know—if it was dependency injection, obviously, if you just require it, we're not going to know about it but if it was dependency injected and because we spawn all that, we resolve that dependency tree when the request is instantiated.

It means we know about everything that you've created and we can easily destroy it, clean it up or give you a reference to it at any point in that tree or any point in that request, so that's all I was saying. Because there are similar problems like that in other places where people have like, wind up taking something and passing it through just to have a handle on it.

[audience] What do you use to do that?
Is it internal?

Yeah, yeah, it's part of—it's part of the framework, that's all. There are parts of the Tesla framework that are kind of borrowed from other open source places. So, for example, our router is very, very similar to Express's router because it's based on Express's router, and some stuff is just a plug-in, like most of the framework is plug and play.

It comes with a logger which is Winston, but you can replace it with any other, and if we wanted to replace the default under the hood, you could, but where was I going with this? I don't remember.

[audience] Dependency injection.


But that piece, the dependency
injection piece is totally in house, and totally proprietary to Tesla.

It's not modeled on anything else. In fact, we searched when we were building it many, many times, and only like a year after we put in dependency injection, did we actually see a couple of libraries appear that did dependency injection. I'm sure there's a couple of more that maybe do it just as well as we do it now, but at the time it was pretty unique, and still when I looked there was a couple of frameworks that did it kind of like Spring does it, where you configure all the dependency injection externally in a configuration, and I prefer ours much better because right there, you say like my constructor takes argument A, B and C, and then you say for A, B, and C I want to inject 1, 2, and 3, and you do it right there in the code.

So when you're developing a module, you don't need to know about the whole ecosystem or you don't need to go to this config file that represents the entire dependency tree, you're just looking right there, you just care about your code.

[audience] Thanks.

[audience] Hi,
I guess this is for anybody, but there's been some discussion on the web recently about the inefficiency of putting like a function literal inside an object literal, specifically in a hot code path, and then it really made me think about like the callback cycle from MongoDB and nested callbacks, and if you define those as function literals, potentially, that could make the V8 VM have to recompile those on each run of the database query. And I was just wondering if you guys had any suggestions or thoughts or how you really do this in terms of callback factories.

What do you generally do, if you need a callback that needs to take some context, how do you generally do that?

Let me rephrase slightly for a part of the question.
Do you guys use any particular control flow libraries in your development process or you just really love closures when you're writing stuff?

We actually get along pretty well with just after_after, and I know we'll be put on the cross for this, but we use CoffeeScript so we do…

[Bryan] Put him on the cross,
please, I need a cross, we need a cross on the stage.

So, we do primarily
async to do that kind of stuff and to manage that. We heard about some very interesting stuff talking to TJ this morning, like a library they do called vasync, which helps with debugging because that's one of the pains in the butts even with async as you jump from this part of the waterfall to this part and, oh! now there is a bunch of stack trace in between that has nothing to do with you.

And they, apparently have some stuff that helps around that so, we'll be looking into that but otherwise we primarily use async. And especially since we've converted a lot of existing developers, this is certainly like the primary complaint they have when they look at the code, so it's not like we have a mix, like everyone jumps to async right away because they don't want to deal, like they come from somewhere else and they're like there's no way I'm going to do this callback craziness.

So, I mean, you're still doing it but it's much neater.

Roberto, do you have anything to add to that?

No, I think it's the same thing.
What I'll say is this,
so first and foremost, there is a lot that you can do to make the V8 jit happy. What you should be concerned about is making yourself happy, and writing code that you want to maintain, OK? So, until you've identified that the only way to solve this problem that you have right now, is to go in there into the depths and understand how V8 is optimizing your code, don't worry about it too much.

That being said, closures can be a nightmare for very many reasons, the least of which are resource leaks, which you feel like are memory leaks most of the time because you didn't realize how much scope you were carrying along. But there…so at Joyent, there's a code style limit of like, they believe fully in the 80 column tab thing so, 80 columns then there's s conversation around white space which I won't get into, but 80 columns makes it pretty easy for you to identify if you've gone too far into closure land, because if you've got to go over here and you can't fit your function in 80 columns then you probably need to refactor, and the great thing about refactoring your code at that point in time and coming out of the closure mechanism, you have to stop thinking of—you're not using closures and you're thinking more of a stack-based mechanism and you're only passing the state that you need to carry around with you.

So, you kind of, by nature of having this code style there, you solve some of the closure problems that you actually would run into in production. Go for it.

[audience] I was just really just trying to get a sense of
how you pass that minimal amount of state to the callback that requires it, outside of maybe nesting callbacks again and again and again. And I guess the async library is one method.

Yeah, async. Some people
use async for that mechanism, or waterfalls specifically help with if you're chaining a bunch of asynchronous event together in that mechanism.

There's
also a little known one, async.auto, which I think is pretty awesome. I don't know how crazy the code might get underneath, but you can come up with like crazy dependency trees of stuff and let async kind of handle in what order everything is done.

But beyond that, it really depends on your application or it depends on what you're doing about how much state and how to figure out how to do that. Maybe what you want to be able—maybe want you want to do, is change your state to be—define it in terms of a class or something like that where you're doing it…

[audience] Yeah,
that was the other thing, you put it on the prototype chain or something.

Exactly, and also as far as performance and function literals, like if you just follow prototypical inheritance for your JavaScript description, if you just live that and breath that, your V8 will do wonders under the hood. It's an easy—it's good style, and when you follow that pattern, V8 does wonders for you, so just again, write code you want to maintain.

[audience] Thanks.

[audience] Alright, Hi, my question is
about Harmony features, I guess that's kind of relevant here. I would like to know if there are any plans for enabling Harmony permanently, by default.

Okay, I'm going to
ask that question first to, anybody up here running Harmony or any kind of advanced ES6 or like features in production?

I mean, we've played with it because…we've turned it on and messed with it, and we're not seriously considering in production right now but we're having this—as part of this callback thing we're also, should we go to Promises, should we do generators, the same whole battle that's playing out in the community is playing out at our company.

Same thing. We've experimenting with Koa a little bit, which is a different framework, but nothing in production yet.

You guys? No, no that's fine.


So as far as Node enabling
Harmony by default for everybody else, we're not going to jump in front of V8 and when Google decides that V8 should have that enabled by default, that's when Node's going to get it enabled by default.

We jumped in front at one point in time for typed arrays and array buffers. The dangerous part of that is we picked an implementation, and then we had to make sure that when V8 went to enable that that we were actually within spec, and that people who wrote code on that were going to be able to work going forward. But beyond that Node's not going to get ahead of what Google wants to enable for V8, especially since you can enable it on the command line for that mechanism. If you're doing your own builds of Node, you can actually pass a configure option now such that you can enable it in your build, so you don't have to worry about always having to remember to do that.

But even if you do that, my suggestion to you would not be to run that in production, because it's still all brand new technology as far as Google is concerned, and it takes a lot of iterations for a lot of those things to become acceptable for production use. And if you stay in the ES4, ES5 world, your life will be a lot happier, plus your modules will be able to be able to be used by more people.

[Bryan] On those new language features too, be very cautious because often the VM implementers, just want to get them working. And making them perform, it's viewed as kind of syntactic sugar and it's a lower priority towards making it perform, so if you're concerned about performance, be sure measure as you are actually deploying those new language features.

Yeah.

[audience] Hey Roberto, how would you go about, like starting an
innovation lab like you guys did, and how do you start that first project with Node?

I think the idea
was, we saw that the whole company have an awesome R&D group, but the cycle of innovation on those R&D groups were not great, right? And just because when you get to a certain size, it's hard to be extremely agile, right, and experiment with new things, so we just went to the CTO at the that time and the CIO, and they were very open to the idea of creating a small group, right, so the idea was almost like let's start a startup, right, with a $11 billion angel, which is quite cool, right?

So we said, let's start small. We actually started with four, five people, right? Surge is here, he was one of the first ones, and we define let's make one project that we can be successful with and make the name of the lab on that one project. So we said mobile was high visibility, huge importance for the company. We said let's start small with a couple of functions in that, but make it such that it's so transformational for the company that it would just give us the cred to say, OK, now we can do this damn semantic search, and no one have done yet, right?

And that's the way we preach it, and we started small, just like a startup, right? I mean, everyone in that group knew exactly everything that was happening on the stack, from the OS to the UI, right? Crank it, crank it, crank it, make it fast, put it out there all the way to production, that's the other thing we did with the lab that was important. It's not a research area, right?

It's what we call incubation area. So we get something from idea all the way to production. And we did a lot of devops because our stack was obviously not compatible with what was out there. We were Oracle, J2EE shop. So we said, okay, let's devops, give us a couple of VMs. We're going to sit behind your security, so we're not filling you up with authentication, authorization.

And it worked quite well. I mean, I think we need more failures on the lab, because that should be part of the research, is to fail. But I think it's that, start small, very focused on one project, make that your success story and then you get the creds to start growing. So that's the idea. [audience] Hi. I have two questions for the Node team. As we know JavaScript is not—one of the things it's not strong at is numerical calculations, and as you know, it straight makes Node.js not comparable to other backend technologies, because most projects have business requirements and those business requirements very often rely on the ability to perform calculations.

So, do you guys plan to compensate on that by providing core C libraries or core capabilities to provide alternatives to that or compensate for that lack of support?

So, go ahead, is there more to that?

[audience] Yeah, sorry.
Second question is, is there any way maybe I overlooked, any way to package modules into native builds or native executables so you can, let's say you want to protect part of your solution that you don't want to open source?

Okay, so I'm going to
answer that last part first. Generally, I call that bundling. Basically, we're going to attach all of that into the Node binary and then ship that all as one thing, so you want a tool that you want to distribute it's written on Node, go ahead and do that.

If your decision at that point in time is to then pack it and do all kinds of cryptographic fun with it to make it protected, that's up to you, that's at your discretion, but that's being actively looked at right now, and there's a couple of implementations, one by Bradley Meck. It's kind of interesting. The first piece to this actually, in my estimation, is to actually fix the module loader to understand what it means to operate in that kind of environment, and what also falls out of that, is actually understanding how to load something that you might look at as a jar today or an egg file for Python, which is to wrap up a module inside of a zip or a tarball at that point in time.

So that's the first step of that process, understand what that API looks like and make sure modules behave in that environment, and then once you understand what that is, then you can easily tarball up the whole project, and put it in the Node binary, and then it knows how to launch it when it's in that pursuit.

[audience] Is that upcoming?

That's under discussion right now. It's not going to be in 0.12. There are people working on it to make it work in 10 and 12 as external projects, so there are people doing that, and it's something we need to get right for everybody, but it's a conversation that's happening.

Beyond that, as far as high precision and big num
kind of conversations, there are modules out there for some of that stuff. Part of the problem is JavaScript.

Some people say it's a problem, some people will say it's a good thing. Part of the problem would be that JavaScript doesn't have operator overloading, so it's not like Node can just make data structures that handle higher precision and handle big numbers. It's difficult from that perspective. What might be a little bit of an easier conversation to say is, so some of that is being solved in ESLand and could take like two decades to actually show up in JavaScript.

So, but what's easier for me to say, is like, in a hypothetical world where FFI were included as a first class feature inside of Node, if Node ships FFI, Node ships also openssl which also understands how to do high precision, big numbers. And if somebody wrote a JavaScript library that just utilized FFI to reach into the openssl that was already distributed with Node to do that, then that's the easiest way, zero compilation required, and it's a good mechanism to do that.

So what I want to be able to do for Node, is be able to just identify the minimum subset that I actually can do that reaches everybody's needs. If Node itself needed to do high precision or big numbers as a first class feature, then everybody will get the benefit of that, but I don't see a future right now where that's the case.

[audience] Cool.


[audience] Hi,
so could anyone tell me more about async listeners? Like what they are, and how do they relate to domains if they do?

Do you guys, I mean,
Scott knows about async listeners somewhat because he's been here about continuation local storage. Async listeners is an interesting beast that's being added in 0.12 as part of our tracing framework.

Basically, it's exposing an implementation detail of Node from our C++ layer to kind of help trace all asynchronous operations that Node performs. That's a big statement to understand and comes with big requirement for that. I'd be happy to talk to you about what that means off line because this is a very difficult conversation to actually have. It's relationship to domains is also tenuous.

My suggestion to people here is to not necessarily use domains, and not necessarily approach something like error handling with the concept that you can fix an error by just adding more code around it. So because if you could have fixed the problem with just more JavaScript, you would have just wrote that JavaScript in the first place.

A colleague of mine, Dave Pacheco, wrote an error document very recently that's on joyent.com that you can read that talks about the difference between operational errors and programmer errors, and how that relate—and what that means for your Node process, and any software development mechanism that you're actually using, and I would really suggest that if you're interested in those kinds of conversations about domains and async listeners, if you read that document, I think you might have a different take on what it would mean to do that inside of Node.

[Bryan] You guys, has anyone read that document that TJ is referring to, I don't know if you've seen this? So if you Google Node.js and error handling, the first hit is like some stupid stack overflow question from like 2009, but the second hit is this document that TJ is referring to. This is a must read if you develop Node.

[Bryan] Error handling sucks, it just always sucks, errors suck, but this document is very important for providing a taxonomy for errors, and in particular, for distinguishing operational errors, errors that happen in nature that your program needs to deal with, with programmatic errors, errors in your brain because Eve bit the apple and we have original sin, and you are a flawed person.

[Bryan] It's very important to understand the difference between those two, and that helps, I think shed light on why we believe that domains are really not the right answer for the programmatic error, so if you only get one thing out of this, I would say check out that document, it's a really terrific document.

Absolutely.

Do you have another question?

Other question?


[audience] This got mentioned a little
bit, but we've been nervous about the relationship between Node core, and the public NPM registry, and stability that's been happening with the public NPM registry, and so I was interested to hear Scott talk about running a private NPM registry, and so I'm wondering whether you would recommend for any organization to run their own private NPM registry? If that's a good idea, and more generally how TJ might have an opinion about how NPM registries should be run?

[TJ] Yes, you guys talk first about your usage of NPM.

Yeah, I mean, for us it's not that hard to run NPM. The documentation could be better, because I don't think Isaac was intending for a bunch of people to be just running it all over the place, but it's essentially all in couchDB, and it has some nuances that you will need to learn over time, like for example, the application code itself is in the registry as is with Couch, so sometimes you will be syncing that database to your local one and you'll get updates in code and NPM will work differently, right?

So, stuff like that can catch you and it has caught us. But really, we've had one, one and a half guys working on it for the past year, so it's not like resource intensive to keep it up and running, he's basically devops, and if there're any problems, if something doesn't sync or get deployed, like he's the guy people go to. And he's running two registries across several servers, because we have like the hot backups and stuff.

So yeah, it's not hard to do. I don't know if I'd recommend it, especially if you're getting started, if you have the time, I might wait a little bit to see what Isaac is doing with NPM and see where it's going, that's still a pretty hot topic.

For us, just addressing
the stability, like we just checked in our Node modules into our git repository and that solved it. And then like running any builds, deploying isn't dependent on NPM at all.

That's what we do at ADP as well. Check them in our private Git, that's it.

So from Node's
perspective there's not a lot of real tension between Node and NPM. Isaac and I are good friends, we spend a lot of time working side by side on things. I take it very seriously though when there are stability issues because people come and complain to Node about those problems, and Node ships NPM by default.

But to be clear, the amazing thing about Node, while the ecosystem and the community around that of people who are creating modules and stuff there, the novelty is actually in the module loader of Node itself and the idea that you can have multiple concurrent implementations of the same module, but different versions, and that doesn't pollute and you don't have dependency conflicts at that level.

So that is inside of Node, and as far as Node is concerned, it doesn't really care how those files got there on disc and got to the right layout and that kind of mechanism, so there is a relationship between Node and NPM, but in project relationship, as far as the Node binary is concerned, it doesn't really know anything about any of that.

So when these companies up here are working on solving those problems, it's a problem for everybody that they have, and it's a problem for Joyent as well, and if there were an alternate solution for whatever that mechanism actually was, as far as deploys are concerned, which is where most people up here have a problem, is during the deployment process, then that would be open source by the community in general and then you'd be able to avail yourself of that out there.

Okay, thanks.

[audience] This is especially a question for Scott. I was just wondering, you said that you had about eight years of JavaScript experience from the frontend, I suppose, and I was just wondering whether on your team you had converted frontend devs to Node.js devs, and what was the biggest road block, challenge, and if anyone else had done that, I'd love to hear from you as well.

That's an awesome question, and yes,
we have, probably two, well, I don't know if it's to a lesser extent but we definitely have done it a lot of times, many times to great success. I know a couple of frontend devs that became better producers and more stable code than people who had been doing backend.

And I think part of the reason for that, is knowing the idiosyncrasies of JavaScript if you're really good at JavaScript is going to make for much less buggy, more stable, more straightforward, readable code. And you can, kind of, take it step by step with the JavaScript developer, right, so they understand that front layer and you, kind of take it a little bit further back each time.

So it can work, it's just a different—you're just coming at it from a different angle.

Good. Same thing we've
got a couple of live examples sitting here. They used to do JavaScript on the frontend and now doing the server. And that's phenomenal to be able to kind of go from one to the other with one stack. And in the case of our mobile, it's funny because we use Mongo a lot.

It's almost like you go JavaScript from the glass to the persistence, right? One language which is unprecedented, right? I mean to be able to do that, MVC server, app server all the way to the aggregation or map produce on your database.

Yeah, like I said in my
presentation, like all of what would traditionally have been done by shell scripts or other task management things, or RPMs or whatever, all JavaScript everywhere. I don't ever touch anything that's not JavaScript.

Yeah, the deployment script is JavaScript, it's like everything is JavaScript.

Actually like for us, Craig wrote a lot
of the framework that we work in and the general design patterns. But we have like a principle of just having an engineer really own a feature end to end, so they build out, like the client facing UI, the APIs, support it, make the model changes. And we always had to, sort of, have our engineers bounce between JavaScript and sometimes Ruby, as well, and so they're all relatively familiar with it, but like having a sort of existing design pattern they are working within has been really, really helpful. Has made it easy to bring people that are crew rubious into our CoffeeScript land.

Yeah, and
we do the same thing, just like in that open source model, like anyone can contribute anywhere, so if the job in Jenkins which is really just Jenkins calling a Node process, right, if that blows up, it's not on the guy who developed that. That developer can just fix it and submit a pull request, and then all of a sudden that process works for everyone, which is good also when people are away from their desk or whatever if it's kind of a development emergency where now people are, they can't develop or release something for half hour, just anyone can pick that up and work on it.

[Bryan] Probably will be one more question here, and then I think that we're going to be sticking around here but I just want to allow, kind of, people to get home to if you need to split, don't think you're offending us if you are.

[audience] There was comments that two
servers to six servers were supporting a very big workload. Can you guys tell us a little about how those are configured, the size of the machines, and if you're using a single Node process per host, or if you're using the cluster module, or multiple Node processes? How do you align those with number of cores or CPU's per host? Or if you have a really big cluster of database and cache behind that, and Node's just throwing packets.

We have very simply just used Heroku's 2x Dynos because we have our app, really the load is on the API, but we also lighten that load by using the caching layer, and that's just a Heroku add-on, so our server setup is very simple, and we just probably don't have nearly the scale of other companies to need to scale up the servers there.

I mean, at this point, we've seen all
our Node apps response times exactly mirrors our API response time, and so even with getting like, in our case, many, many thousands of like requests from the net, that's like relatively high for us, like we have, it'll just keep on trucking along.

I mean, everything is
VMs, fairly small four cores, 16 GB. Or 32, 16 GB, I think, and the cool thing we did, is that they run everything. We don't have a separate database cluster. It's one stack that runs both the—it's from the OS to Node or nginx, Node, MongoS, so the database runs on the same machine because we analyse it, it's actually very different patterns of usage of the machine, so literally for us to deploy is one stack, and you just deploy horizontally, right?

All of them look exactly the same, there's no separate cluster for a database, and then separate cluster for AD apps server, it's just like one thing, and it's just like poop, poop, poop, that right, and it performs extremely well because the pattern of usage of the Mongo and the Node are very different, and we've seen the same with when you're testing with Casandra and Redis. Same kind of thing.

So when I say six machines, is everything, database, app server, API proxy, everything.

So for us kind of
two things, one is comparing to the app we had before, I'm using WSJ as an example. We had it on, between four and six Java servers, but in order to have that actually work, we had to statically, when pages were published, we had to statically cache that content, so Java would really be rendering it once and tons of stuff was stored in Akamai, we don't do any of that anymore.

Of course we still keep our resources in Akamai, like images, and stuff, but a lot of the CSS, a lot of HTML is generated in real time, every time for every user because we don't know what preferences or what customization or whatever that user is going to have, and that's one of the reasons we actually switched to Node as well, because in the Java world, because we couldn't do that and because we were just statically generating stuff, it meant that we had to do a lot of the customization on the front end in JS, and we couldn't render the page in the customized way.

As far as cluster, we do use cluster under the hood. We have this thing called TeslaD which is like a service start/stop, but is also allows you to have some configuration on the machine, and this way the app developer or the product group doesn't have to worry about cluster. They just say, hey here's this JSON file, I want to use eight Nodes and that's going to spin up eight Node instances, so for WSJ, is it eight?

I'm looking at the guy who might know. He's saying yes, so that rendering app is eight cores across four machines and those are actually split. We have two discrete boxes and, then two AWS instances that are pretty equivalent to those discretes.

[Bryan] And speaking for us at Joyent because we're big users of Node as well, when we first started deploying Node in production, it was a real open question about how much CPU is this thing going to burn, and it's JavaScript and so on, and we found that every Node service that we have deployed has surprised us when it was being light on CPU.

DRAM can be much more of an issue than CPU. Right now, the two biggest components in our ecosystem that burn CPU for no goddamn good reason are one's in Erlang, and that's going to get ripped out, and one's in Java, and that's going to get ripped out. Zookeeper is by the devil, for the devil as far as I'm concerned. And in both of those cases, one, we're going to architect away, the other, we're going to come up with a Node-based alternative.

Because I just can't live with a JVM doing this to me anymore, and I've gotten used to the Node world where things are light on CPU. When they're not light on CPU, I can figure out exactly why and I can go fix it, and you get very spoiled with that world. So TJ, you want to get the last word there?

Yeah, just to finish it up. So, when you're looking at doing something like the cluster module or horizontally scaling on the physical machine, on whatever the machine is, physical or horizontal, before you do that, make sure there's a reason to do it. If you're seeing CPU usage start to go through, or you're seeing latencies start to go up because you're not able to handle the concurrency of a single process, then yes.

Go ahead and scale out, or maybe you have some rolling update concepts that you want to do as far as zero downtime for new code deploys. Those are good reasons to horizontally scale on a single machine, but don't just jump into it because that's what everybody's saying to do. Make sure there's a real need to it.

Node's actually, as it turns out, pretty good at high concurrency, so just make sure there's a good reason to do it. Second, I'm going to end on a real positive note. Don't use the Node cluster module. There's some, underneath the hood, it's doing a lot of automagic that may or may not help you. In 0.10, each worker is accepting on the listener socket, and if you have eight cores, it may be that you only see four of those processes actually being hot, and you're not actually distributing the load in the way that you are expecting to see it. In 0.12, we changed it.

Now the master's going to accept on that socket and then distribute work out to the children, but the caveat there, of course, is that communication between the master and child is synchronous communication, the messaging layer is JSON, and that's asynchronous processing that all need to happen, and if the worker starts to get lost or is getting behind and it stops receiving on that IPC mechanism, the master is going to be blocked sending information to that child.

And that is exactly the opposite of what you wanted to actually have in this implementation. What I suggest for a lot of people when they're hitting that scale where they need to horizontally scale out, put something like nginx, HAproxy, Apache in front of that, tell Node to listen on a Unix domain socket and be done with that. If you need to go help the multiple machines, do something similar to what ADP does, and like put an F5 or some other kind of hardware load balancer across multiple machines and then on the virtual machine or what have you, scale out with something that you can configure for your load balancing on a more discrete manner.

[Bryan] Great and I know there are other questions so I think let's wrap it up now, we're going to be around and we'll be happy to talk as long as you want to talk. So thank you so much guys.

We actually get kicked out of the venue at some point in time.
:

Sign up now for Instant Cloud Access Get Started