Node.js on the Road: Ben Acker

Node.js on the Road is an event series aimed at sharing Node.js production user stories with the broader community. Watch for key learnings, benefits, and patterns around deploying Node.js.

Ben Acker, Senior Software Engineer at Walmart, talks about their use of Node.js.

SPEAKER:
Ben Acker Senior Software Engineer
Walmart

An interesting bit of information is, this is the first time I've given a talk in front of any of my family members. [baby cries in audience] You can see how much he enjoys it. Thanks, coming out to Portugal has been fantastic. Everyone here, the staff of LXJS, everyone we've met, it's just been a really fantastic experience, and I just wanted to say thank you before I get started.

And usually when I start like the free, the free radio stations in America are called NPR and this is my NPR voice, I don't know if there's probably some really good equivalents here, and I usually start off with this just because it's not nice and soothing, and it helps to get things off on the right foot, I also had some help translating.

I had some help translating for this. I've got Node all over the place, so just trying to get them all set up. Alright. I did have some help translating this. It used to say tu quieres Node, but I had to switch it because I didn't feel that that was appropriate. OK, another reason I use NPR voice is because quite often that's what's you use to introduce nature shows, and I've got a nature show that's coming up here.

I'm dropping out a little bit, are we going to switch over? OK, you all are going to enjoy this. All right, OK. Is this working? OK we've got some very vocal members of our team and there's been a lot of talks about node at Walmart. Some of them have been by this guy over here also, but these are links, this slide deck will be available all these are links to these stalks. There's a whole bunch; there's another one from Wyatt at LXJS that I'll add onto here.

But they cover a whole bunch of different stuff, about things that we've, things that we've done, things that we've done at Walmart and examples of what's been going on from the back end to the front end including lots of our open source libraries, etcetera, and this
is where the
NPR voice should come in—in the long, long ago.

There's this little small American retail establishment called Walmart and they created a pretty big empire, and had all kinds of stuff going on, there were selling all sorts of stuff all over the nation, and they created this giant state-of-the-art fulfillment system with straight up robots that are sending—they like do all kinds of crazy stuff with moving packages all over the place, and then they created a website where they could sell that stuff and it was OK, and like later on, they revamped the website a few times, there's loads and loads of people working on it later on, they decided to go to mobile.

They built a mobile site. It was great it went really great, there wasn't anything, there was no native experience, it was all what you would, when you think of a large corporation and you think of the stuff that goes into creating some kind of technical thing for that corporation, there's lots of like bureaucracy that you have to go through, and this website or this mobile website represented quite a bit of it.

First of all, they just did mobile web. They didn't do any native things at all for iOS, Android, Windows, anything. Second, they had loads of decisions like "well they have a mobile device, they are going to be going into a store, we don't want them to be able to go into the store without having the information for the store, but how do we know which store they are going to go to?"

"Who cares, let's load every single store in the United States before we allow the App to start. That's good, I mean they can wait two minutes, it's cool, it's cool." So decisions like that which obviously is a bad one. Thank you so much for the water, holy cow. So yeah, they ended up building a site. It was bad, they hired an external group to go in, and redo the site.

They redid it and they made native versions for iOS, they made native versions for Android, and it ended up being pretty good. But it was still based on all of these stuff that was out there, so while they were doing this they had this services group that had worked with the original project, that were going to be providing services to these native clients.

So enter Java guy. Java guy's happy. Java guy's creating. He represents the services team as it existed for Walmart mobile services; he's a happy little guy, walking around doing some services for these native applications. Now previously, it was just one team that was doing stuff, right? One team, that was doing things everything was cool, they had services they had front-end work going on, everything is happy. But now there's there's a whole bunch more folks that are there doing things. I don't know if you've noticed but a lot of the Walmart folks have just drawings for slides. I do all my own work, just want to make sure everybody knew.

So now you've got a lot more teams and some of the teams are external too, so they don't have contact with them, so the Java services guy they're getting a little nervous because these services are built on that old website that was there before which was never designed to provide services of any kind.

Some of the things that they had to do (these Java guys) they start getting a little bit anxious because in some cases they are screeen scraping the website to provide services for these mobile clients. That's not a very good situation to be in. And then as stuff continues to go on like there's loads and loads of teams clammering for different types of features because they all have different product managers, they all have different project managers, and they are getting they are getting angry, because there's only so many of them to service this entire corporation, building this mobile website and then everything goes into—everything just goes to Hades like there is a Black Friday, there's loads and loads of—like there's an outage for six hours—missing revenue, people are getting woken up at all hours of the night, and this just continues and goes on and on. There's service calls at 3 am. There's all kinds of stuff going on to make the services team absolutely miserable.

So this is the team that we joined. Huzzah! Enter Eran Hammer. Knowing that they needed to get some stuff changed with services, the guys that had been brought in to do—the folks who had been brought in to do the original, like the remake of the site for native and everything, hired Eran Hammer to come in and he picked Node as the target platform to use. This is about two and a half years ago and just came up with a plan. So I have used this slide before but it's one of my favorite slides; you'll see why in just a moment. So, we've got the old site, right, the old site that's got all the data and it's providing everything.

This is my technical diagram. We've got the old site. This is where all the stuff is coming from. This is the old walmart.com, OK? Then on top of that, we've got the mobile stuff, and this is where the mobile services are coming from, and that's Wicket— there's a java framework called wicket, and so I get to draw Ewoks. And so the plan is to take node and and jam it in front of these services as a reverse proxy, and what that's going to do—here there we go, there's the last part of the thing.

Alright, so what that does is it gave us analytics; it gave us the ability to get node into production. Previously, somebody waving, OK we're good, I just thought I saw someone waving, I'm hallucinating, everything is OK, everything is OK.

I mentioned the large bureaucracy,
getting stuff into production takes a very long time. There's a whole of bunch of stuff, whole bunch of hoops you have to go through just to get machines requisitioned. We have clouds that we could have used internally, but we had to requisition the hardware to run those clouds. It was kind of absolutely ridiculous.

So just having a proxy with not much there, getting it in there was the plan and then once that's there we had the ability to go in and create any new functionality inside node, and because we have a proxy that's reporting on the performance of the services underneath we can use that as a back log to say this one is not performant this one is not performant we need to do—like we could experiment with caching with certain services, or we could rewrite the the service in node just to see—it became our backlog and there's still parts of it that is our backlog.

Now throughout this entire thing what ended up happening is that we used a whole bunch of different web frameworks. Eran brought one over that was originally based on Express and we ended up rolling our own called hapi that some of you all have heard of and we created a plugin architecture that worked fantastic for these environments where we had loads of different teams, and one of the best things that I enjoyed about the plugin environment is that this is literally our build. The build scripts—we use Jenkins, which TJ loves, thanks.

It runs this stuff on PRs for us and runs it—runs at certain time intervals, but it's npm install, npm test then it removes node modules, and then installs production, shrink wraps and tars it up. That's it. The artifacts for this build instead of having a centralized app the hapi has a binary, so there is no centralized app. Our deployment artifacts that we do this from are just a package.json and a configuration file. From those we can rebuild—like from that at any point time we can rebuild the production artifacts that went into production at that time, which is pretty cool. It makes it very easy and to deploy, that's it. Like it just takes it, and we deploy out to all the host, and then untar it and start it up. That's all.

It makes it very very simple, infact the—I will add the links on here, but we're also open sourcing all of the stuff that we used to setup Jenkins and those are as part of LXJS, but those are all going to be open sourced and people can use them as much as they would like. So this led up to the—Thanksgiving in America, there is this afterwards everyone goes shopping and it's called Black Friday.

Traditionally it was said from when a company goes from being in the red to being in the black because they sell a lot of stuff and really it's horrible. Quite often our family will go up into the mountains on Black Friday to completely get away from everything and like go skiing, sledding or you know play banjos or something but just to get away from all that, but it's really, really crazy. And so last year during Black Friday instead of doing that, our entire team live tweeted everything that happened for having Node in production on Black Friday.

We had, I think Walmart just in one day had over 500 million unique users, and imagine for each one of those, a large portion of those went to mobile, and for each one that went to mobile at this point, the entire Walmart mobile website is being served via hapi in Node. All of the services for all of the other native experiences are being served through Node.

The other thing that's cool is that we created an analytics system that is also in Node. So for every single request that's happening from mobile web or from any of the native experiences, you're getting between three to six full HTTP requests on Node servers, for hundreds of millions of people. Did anybody here follow Node Black Friday?

OK couple of folks, do you, yes it's pretty good, yeah? So wow that's a little bit down there, so Eran was tweeting—it was boring! It was really cool to see but we were bored, like going from somewhere where literally three or four times a week people were getting called at night to be woken up on these services calls—once, one of our CPUs spiked to 2% usage on one of our machines.

There was nothing going on, but something ended up happening that was really, really kind of cool. Anybody remember what happened here? So these represent the memory usage for each of our VMs running on our stack, and so see how some of them are starting to drop. There, if you know what's going on don't, if you know what's going on don't say.

Does anybody have any idea? About this? Highest traffic day of the year, anyone? Anyone? So there's that one weird outlier. We just left him alone, like because everything else was going fine. Alright, that's a deployment, on the heaviest traffic day of the year, we deployed. Just because we could. We didn't lose any revenue nothing bad happened there was something on the m website that folks felt they needed to change, and so we deployed it. By the time that (it was kind of naughty I guess) by the time that upper management had found out, "No you can't deploy" we were already finished. Because of how quickly—the deployments goe so quickly npm install and npm test just happens ridiculously fast and after that we were just SCPing these binaries over to all the individual VMs, so there was no time involved in it whatsoever.

We do it in stages, so its checked and it usually takes about 15 minutes but if all of those weren't there, it would be done within just a few minutes from npm install to having been deployed to all of them because it happens, it happens ridiculously fast. But yes, so we deployed, we deployed on the middle of black Friday and that was all done by, the deploying and a lot of our Jenkins work is done by Lloyd Benson who is our devops guy really fantastic, TJ can attest to his, he did another talk which on a Node Road which you all should check out, its really good.

So this was Eran. This is Eran pretty much during the whole thing, like we were we literally were on a Google hangout. It went for three days straight, and I know I was on there for about 36 hours straight and like the most exciting that happened was Eran started playing One Direction at like three o'clock in the morning when everyone was falling asleep and he just started blaring at it through the connection. So it was boring, and that was awesome.

So, what's happened since then? The team has grown. Back then, there was only five of us, and now we have 20. And the reason—this was last, this was in November of last year. So we've gone from five to 20 in about six months, and the reason for that is because instead of just having mobile be fronted by Node, Node is now moving out, we're doing projects right now to have node be micro services layer for all of onemore.com which is you know, that's really cool, so apparently, it went really well.

Now, with that happening, we couldn't be mobile services anymore, and I thought with the success that we'd be able to pick a cool team name. Something cool, something that would be good, I was like the away team. We should be the away team, like Star Trek. That'll be cool, that'll be a cool name for our team. And they're like, nope. And I was like, well, what if we throw back to like goofy 80s and 90s, hip hop terminologies, and we become Hammer Team. You know you're Eran Hammer, and it'll be like Hammer time except a play on words, and they're like no, we've got this awesome name. We've got this rad name that's going to be awesome for our team.

It's going to be so good, so we are the Client Services team.
But yet it's been good, but ultimately like there were only four or five of us depending on when you look at it getting paid by Walmart. None of this work would have been possible without all of the work from the community at large. Not just the folks that Nuno was mentioning who are just making commitments to node and everybody making commitments to node, but quite literally like TJ went—so one day we were going to go and do a deployment, right?

So, instead of instead of, like you want to do small nice incremental deployments, right? Good stuff. So we decided one deployment move a major node version, switch our deployment operating system, and release a whole new version of hapi, deploy all of that at the same time. Everything blew up and there is this, it blew up with way more than the memory leak but once we winnowed it down we found there is this there's this memory leak that he's spoken about quite a bit because he is the one that fixed it, and it was crazy. But that's a good example of support that we've received. Also if you all have had a chance to meet Trevor Norris he's always offering to help them, he's like "I've got this new stuff that I think you can be able to use it will speed everything up and it will be awesome," and folks are always helping out and it's great—this is adventure cat, I love this guy.

The JavaScript community at large is a wonderful community to be a part of. World wide, it's one of the most generous and friendly groups anywhere from front-end work all the way to actual Node—hardware Node bots stuff. Everyone is just a pleasure to be around and LXJS was a great example of that. In fact there was a phenomenal talk that's already posted in public by Mike Brevoort and one of the things that he said was, "We stand near the front of the line…instead of clawing your way forward, turn around and pull someone else up or pull someone else up." It says it right up there, I was saying it incorrectly.

So that's something that we can do. If you're not, get involved in the community. Nuno was talking about technical contributions. Those are important and greatly needed. TJ was talking about documentation contributions, those are important and greatly needed. [xx] to be friendly, you can be friendly to someone on IRC on there first time joining up you just say "Hey, welcome." You can be friendly on Twitter to someone announcing say I'm checking out Node, "great is there anything we can help you with?

What's up?" You can be friendly in your local meetup groups or start a local meetup group. Start talking to folks about it. Just being friendly, and smiling, and talking to people can open up remarkable opportunities both for you and that person. So get involved and by doing so you're making the entire community better for all of us, and that's all I have. Join us, join the community. That's all I have thank you everybody.
:

Sign up now for Instant Cloud Access Get Started