We're really excited to be here. Thank you for Node on the Road for having us. We're so excited in fact that we had to have two people split up the excitement to give this talk. My name is Craig Spaeth. I work for Artsy on the engineering team, and I've been with Artsy for forever in startup years, which is almost four years now.
I direct the web team at Artsy, my name is Brennan, I've been at Artsy for close to three years. So for those who don't know, Artsy is a web start-up, we're the Pandora for Fine Arts. Our mission is to bring all the world's art online to anybody with an internet connection. Today we're going to talk about how we recently pointed all of our production traffic to Node.
And to not only Node, but Node apps that share rendering and code server/client. We're going to break that down into why we did this, why we transitioned to Node, how we transitioned to Node, what our stack currently looks like, what our technology architecture is, and then just wrap it up with some pain-points and some wins that we've gotten from Node.
So why did we migrate to Node? First, to go into that, we have to take a brief history of what Artsy's evolution was. So actually it's funny that you brought up PHP, because Artsy actually was a PHP prototype when I first joined. That didn't last very long. So we rebooted onto Rails and we were like great! Rails is super-productive, we love Rails.
Yeah, the complexity of the stack is pretty difficult to sort of quantify exactly, but it really manifested itself for us when like just bringing on new engineers. We have really talented engineers join, who was famililar with Rails, familiar with Node, familiar with Backbone. H'd been very productive in some of our Node apps that we'll talk about later, and I was like, oh yeah.
So how did we start to introduce Node into this big stack that's hard to untangle? First we had the opportunity to bring in—we had a new application we needed to build, which is a CMS, which is acontent management system. We give our partners the ability to manage their inventory in Artsy, and that was the first adventure into Node, and we wanted to solve one of the problems which was testing, which is one of our big pain-points, so that was a success and we're still a single page backbone app, but we did bring testing a lot faster, brought it to the Node process, and tested all of our client side Node on a server, and that was a lot nicer. Next we had the opportunity to get a lot of wins out of our mobile website because scaling down a giant backbone app into this responsive thing, taking the responsive approach and just saying let's just change the CSS wasn't working for us and of course a giant megabyte assets were very friendly to mobile clients. So we took this as an opportunity to tackle some other problems, like the lack of re-use, and large monolithic assets, and also sharing, rendering, and code server and client. So this was a four month project, pretty much done entirely like, mostly by myself, but with the help of a couple of engineers here and there. In that short amount of time we were able to prove this concept and also bring a much better mobile experience. So with that, we wrapped up all of the architecture and found ways to open source the bits that made sense, and we moved on to our desktop web application, and from here this moved the remaining backbone UI onto our new architecture, and again just in a matter of four months with four engineers this time, we were able to move all of our front end from the Rails stack into Node, and now we're currently talking, having all our requests come through Node and use just Rails for the API.
So now Brennan is going to talk about what it's like to transition into Node without killing the business.
It is really important to remember that rewrites kill companies, especially in a very fast paced startup environment. Like, just slowing down future development at all can really give your competitors the edge they need to sort of destroy you. So, we're very cognizant of that. So we wanted to basically do this rewrite very, very quickly, so we did a few things to make that happen. The first, was basically to communicate to the team, like first step of any transition or any rewrite is people.
We basically said, we need to stop building anything that isn't critical or contractually obligated previously and communicate that to the broader team, and then we actually restructured the engineering team itself. We basically had one engineer who's the point person for all bugs and all urgent requests, and they would manage that, while the other team was working in parallel to basically build out the site page by page.
This really—the other benefit of that is getting people familiar with parts of the site that they've never done before. So, say if Craig built the artwork page, or I put him on the artists page, if someone hasn't built, we have different auction functionality, and wanted to get more people involved in that. It's a great time to do that, and after we address the people issues, we want to talk about the infrastructure changes we made to make this easy.
Oops! I hit something, anyway. Basically, we actually used nginx, not HAProxy, but basically we put nginx in front of basically a Rails app and this new Node app, and added a page by page redirect. So say we built the artist page in this case in our new app, we would add a redirect in there and it would serve this new page, and we would always have the old one to fall back on if the Node app crashed, or didn't scale to meet our traffic demands or whatever.
We never actually had to do that, but it was really nice having the backup there. And it also allowed us to scale this up while deleting the old code in the old app, page by page. That was really, really great.
And so after we got to the point where we basically had one page left, we just flipped the switch, and put Node in front, and you can see that on the right here, and so we've still have just one page/post which is being served up by the old Rails app, but that's simply because we're still waiting on designs which is the first time in existence that we've ever waited on designers for features.
So we use node-http-proxy, which is this awesome module and it's just mounted inside of our main Node server and it just proxy the—whitelist the post page and all is well and good.
Alright. Here's what our stack looks like now. We're we're not running on Node .8, we're not one of the stragglers, we're running on 10.2.6.
We use Express, Backbone, and Browserify, basically to glue together our client and server rendering pattern which Craig is going to talk about. We have our Ruby API back-end, which Craig mentioned and basically, we host our app on Heroku, we currently use two Heroku servers to host the whole thing, which is great.
Our assets are deployed on S3 using CloudFront, or fronted by CloudFront. We use a make file to compile our app and basically for deploy and to run all our tests on Travis and Jenkins. Actually the interesting thing about our stack is probably our use of Redis caching. We initially started trying to experiment with caching caching on the client by we wanted to put the whole site behind a CDN.
This obviously would be very, very fast. And we looked at that very, very complicated, makes deploys potentially difficult since say the asset path changes every time on a deploy, you risk someone viewing old content that's cached on a CDN. That's very bad. So we thought about maybe caching say the html of the whole page that's rendered in Node, why not just cache it in Redis, etc, we found that was difficult since we need to serve different content for logged in and logged out users, etc, too complicated. So we instead just overrode backbone sync and basically get and set information in Redis, and it's great. It shaved off about 100 milliseconds of our response time and it's about like 10 months ago, and it's great.
So I highly recommend that.
I'll get a little bit into how we do the shared client server stuff in our Node architecture in general. So we open sourced this thing called Ezel, it's the old dutch spelling of it, so it's Ezel, and basically this is just a boiler plate that we use to bootstrap our Node Apps, as I had show it bootstrapped our desktop and mobile web apps. So essentially what Ezel is it'sa very lightweight boilerplate. The easiest way to explain it is express and backbone glued together with browserify.
So it has three philosophies, which are modularity, flexibility, and isomorphic. Isomorphic is basically sharing, rendering, or sharing code, client and server, and that can go down to the module level, but we try to do as much as sanely possible with that. So let's dive into the modularity and flexibility of Ezel.
So, Ezel has two very simple patterns for modularizing your application. One of them is Apps, which are sub-express applications, which are mounted into the larger express project. Actually I'll go ahead and jump to a slide to show that because I don't think we're going to pull this up later. So you can see where it has app use and then we acquire that specific sub-app in there, and so in our production application it just like spans many lines of app mounting, but it's a nice way to break up your application.
If you want more details, go to Ezeljs.com, and it will explain all the whole bits of it and you can bootstrap your own projects with it.
So, all sounds well and good, great. We did have a few problems. The first one was actually syncing auth across all of our various Node apps. We found that it's a pain so basically, add some extra error handling, we would have to add it to all of our Node apps, and one of the ways we address this was just to actually make a Node module for all of our auth handling, and then have of all these various apps install it. Done.
It wasn't immediately obvious to us, but it was a great pattern, highly recommend it. The next thing is something we're dealing with today, actually. We have a memory leak in our application and, we have really not a lot of transparency into like what exactly is leaking or what the problem is. It does seem to intelligently restart itself which is great, so it has no like user-facing issues with it but anyway, something we'd like to figure out in the future. NPMJS has gone down during some critical times for us. We basically like to check in our Node modules to our GitHub repository. This seems to be like a big win all round.
It gives us a lot of confidence that the code we're testing is the code that we're actually going to be deploying to production. And it also allows us to, I don't know. It's a win. Highly recommend it.
The one thing that we would like is for some way to ignore Node module commits in GitHub, for example. So the path that we've taken around this is to say you're submitting something that requires a Node module change, or new Node module, submit a second PR that's just the Node module change and sort of isolate that from the actual new code that we're going to code review.
This pattern has helped a lot, but it's still little bit of a kludge. The next is really integration testing. As we talked about, we've separated the web client from the API. This inherently creates testing problems, if say, the API changes and the client is dubbing the API, the tests are going to pass, but it's going to fail in production.
So we've build a basically sort of light weight integration testing system, which just checks out the latest version of our API, checks out the latest version of our Node apps, runs them in parallel and runs through a few critical business applications like buying stuff on the site, and creating an account and what not. Make sure that that all works.
It's been great, it's actually, it took a little while to set up, but it's been really, really awesome for us, and just spotted a few bugs.
Yeah, so Selenium has come back to haunt us, but in a good way. So, despite the little problems that Brennan mentioned, really we've just seen wins across the broad from Node. We're just very happy about Node. The community is amazing. The modules, ecosystem is great.
Having many different solutions is—some people might think, Node's not opinionated enough. I think it's great to have many different options to choose from. Also modules that are isomorphic, having JSDOM, and Browserify, and all this just leveraged this idea that seemed crazy to people outside of the community, but it's actually very easily attainable.
And that also means NPM is a great package manager for the server and clients, so we just have—for our internal modules, we just have our package.json point to GitHub URLs, and we use NPM for server/client management and it's awesome. We haven't had any need to dive into core, so that's cool. Productivity is like light years better. Like I said two hour tests we had changed to 1000 plus tests that take under 5 minutes. Like night and day.
We're talking about unit tests and headless integration tests with Zombie.js which uses JSDOM, so we're able to run full stack test, all headless in a matter of minutes, it's awesome.
Our deploys went down significantly, they're around five minutes. Most of that it's just minifying assets. We deploy around five times a day, our deploys were a lot slower, it used to be like rushing to get deploys in once a week even, considering our slow builds and slow tests. Performance has been great. Our page load speed cut in half because we're able to share the rendering. This wasn't necessarily like a focus of ours, but just an awesome win was we used to scale up 40+ Rails servers to serve our production traffic. Now we just have the minimum amount above free of Heroku, two Node servers, running our production website, awesome.
And just like the list goes on. We have SEO benefits because our pages loads faster and we're rendering more on the server, it's just amazing. So overall, we love Node at Artsy, we're super excited to see what comes out of it, and thank you very much.
Node in Production
See techniques for deploying a large-scale, high-uptime production cluster.