Node.js at PayPal

Jeff Harrell talks about PayPal's use of node.js for their web applications and how it's spreading throughout the company.

Jeff Harrell, Director of UI Engineering

PayPal was all archaic; we wanted to innovate faster, and so I was brought in to figure out how to do that. And so in a cool note—I don't know is there like 6, 7, not even that, 3-4 months and I ended up actually getting fed up with the existing systems we had, and just sort of throwing everything side and starting writing code in Node. My team at that point was like me. So we were a little side pocket, and we ended up actually starting out writing in Node, and moving forward with it, and then from there, now things have progressed. I'll talk to you a little bit about that story, but now there's a whole team dedicated towards Node. All of the engineers in PayPal that are working on web apps are writing in Node and more.

So, let us talk about that. So, this started about a year ago. So, I guess 6, 7, 8 months after I got hired, we ended up saying, we had played with Node enough, we had played with it internally, and we, like a lot of other companies had started using it for prototypes. It's often how Node seems to sneak in the door of large companies sometimes, is people just wanting to play with it. And so, we were building things on prototypes. We had actually, in an odd turn of events—we have a Java stack that was our predecessor—we'd set things up so that we could effectively copy a lot of the code over from the Java Stack to the Node Stack, and we could prototype in Node and then copy it over to Java, stick it in production, things would work. That was cool. Ideally that was sort of the incentive behind it; we wanted to iterate faster, and that is why we ended up going down that route.

Node allowed us to do that. That's why Node was specific to that. But, after about a year we started out saying, OK, cool, we now want to actually, it is probably a good idea to go visit production with this. And so we did take a small app, we actually wanted to prove the fact that it would work on something large, something that we really need to scale with. So we went after /myaccount, and that is effectively when you go, if you log in, you end up hitting what we consider to be transactions, but it now resolves to myaccount.

And that's, I don't know, like one of the top three pages that we have that accepts traffic. So mentioned that, again a lot of the incentive behind this for Node, or behind Node for PayPal was design, right? So, the existing design, if you go log in, if you don't get the new version, you get something that looks like it is from the 1990s. And it's actually the case with a lot of PayPal's site, that is what we want to get away from. In an interesting sense, that's what Node is helping us get away from. And so, we ending up building it with Node, and we built it with Node in three months. And actually, I am not going into super detail around that (I have a blog post I did), but we played it safe, because we had a lot of faith in you guys, but. So we also built it in Java.

And so it gave us a great metric because we knew—our key performance thing was not actually app performance, but it was developer performance. If we could build it quicker and perform roughly the same on the website, that's all we wanted. And so, we ended up actually building in half the amount of time with a team that had familiarity with JavaScript, but had never touched Node before.

And so, the cool news is, right now it's live, and I don't know, relatively speaking like hopefully that page doesn't look like it is from the 1990s, I like it, right? But it's live, it is running on Node.js, and Dav's point was really interesting like, we have had very few issues in a lot of just production things with Node.

It has just actually worked for us, right? We got our issues here and there. Talk about a few, but just getting this app up and running and getting pushed live to site, we had to fit it into a lot of legacy existing infrastructure, but it was really easy, and really scalable, and really easy to do. So today, jumping forward, that was actually a year ago, right?

So, now we have 20+ Node apps. Actually this number seems to like grow by an order of magnitude every time I mention it, so it is probably way more than that. Although I did just update it last night, but we have 20+ Node apps that are being built, and these are things, again they are not small apps, they are things like account, right?

They are things like check out, right? That is why PayPal's share of business, that's where we get the vast majority of our money, that's actually running on Node now. We have our home pages which actually are our most hit page, those are running on Node now. A vast majority of our on-boarding stuff which are how we acquire customers, those are running on Node now. And just as a tip, you can easily tell the Node apps, because they look better than the old apps. Again, the design thing, right?

In a cool
turn of events we didn't have a lot of JavaScript developers, and so that was an interesting caveat for us. We had a lot of Java developers. We had a smaller subset of UI engineers who would work on the browser code, and they knew JavaScript, but they knew single-user effectively JavaScript. And so, we've actually spent the last couple of months training a bunch of our developers, and at the moment, we have converted about 200 people over to Node. So that is effectively Java developers who were working on apps, who did not run away screaming entirely, they were a little bit skeptical at first, but it has been fun to train them because they all go things like, signs of, it is like the, what is the, seven signs of death or…

What I am I looking for? Psychology people help me out, where they are like, JavaScript is stupid, that's a kid's toy, whatever, I mean, fine, I want to build a web app, but I'll do it. And then they are like, it is not stupid, but it is like, what? Yeah, it's not stupid, but it's not type-safe either. You can't build a serious app without type safety. And then it's like, well, I mean, OK, type safety, this is cool; it's quick. It's all right. Sweet, then they jump over a couple of more steps, eventually they are like, I hate Java. You guys are stupid. And it is really funny how that has happened, because again, we have about 200 people. We probably will have double, if not triple that by the time we are done with this. And it's super hilarious, because I've had countless people approach me in the halls and be like, a year ago who were like, what the hell are you guys doing with Node, this is stupid. And

now they are like, this Node thing is awesome. How do we do this? We really like doing this.

But it is great. We have been able to convert people who historically a year ago were not even interested in it, and now they are actually writing Node apps, they are really excited about it, and they are writing them much quicker than we have historically, which helps us out in the long run, obviously as a company.

So, little bit of details about
our apps. This is sort of true, so much like Dav, we just got .26 approved, so it is rolling out imminently. I don't know, maybe now, not looking at my phone. So, part of that they're all running to .25 obviously. We are keeping up to date as often as possible. Dav's comment about you guys moving quick and pushing new things is totally accurate. It's fun to keep up, right?

It's like, oh shit! That's a—really need that memory leak fixed. Go push. At the same time, we're not doing it at the moment, we actually—we didn't jump on this obviously till a little bit later than some companies, but we started on 08, and so our plans are to support two stable versions at the same time just to give apps a segue to upgrade. At the moment, all apps that are running are actually running on 10. So it is not a big deal for us, but when 12 comes out, obviously early adopters in our company are going to jump on it and then we'll sort of support 10 for a little bit, and then eventually migrate everyone off of it.

App wise at least, we are not staying on top of the unstable versions for our apps in testing them, and part of that is almost like delegation of concerns like, we are not as worried about unstable although, yes, it is a glimpse of what is to come. But we are testing that actually with internal framework pieces, and that allows us to get an idea of what is going to break for the most part, but at least we get some coverage there. So, one thing interesting that comes up, this is a little bit oversimplififed but I want to call it out for a couple of reasons.

So, we are a company that's SSL, and so we are also a company that has a distributed network of services and underlying functionality. So, the apps that we started with, and that we are currently building are web apps for the most part. And so, these are all fronted with nginx. Nginx is used for a couple of reasons, again this is a little bit oversimplified. Nginx terminates SSL, deals with GZIP, it deals with load balancing, internal to that box, internal to how many core's you have there. Sorry, box. It is basically taking concerns that we don't want Node to deal with, but specifically inbound SSL in this case, and solving them elsewhere.

There is obviously other ways to do this, that is how we ended up with it. Our Node processors are clustered, they are clustered with PM2, ultimately they are effectively like, you hear a lot of people talk about how—it actually came up a couple times earlier—like Node is great for APIs because it can go pull together a bunch of resources and send you back a data.

Well, ultimately at the end of the day like a webpage is really sort of an API, it's just serving HTML. And so our Node apps are for the most part, actually orchestrating downstream APIs, and then on top of that, we're using Dust for rendering, it's JavaScript templating. Sometimes the web app renders, but a lot of times it does'nt, because it gets deferred to the client, but importantly, downstream from there we have a multitude of APIs some of which are like a decade old, some of which are five years old, and some of which are up and coming newer that we are working on.

A lot these APIs are higher latency, and so more often than not it's probably because they just need to be rewritten, they are just old, they need to workout. Sometimes they actually are doing data and work that need's that. Node has been interesting in that case and helpful in that case just for dealing with back pressure at times.

For most of all, they are all dealing with JSON, the important fact here is, they are all dealing SSL. So, once you get into our network we don't stop with SSL, so Node doesn't listen on it, but SSL again, so we do downstream SSL. And that has actually been our biggest nitpick with Node to date. So you have—think about it, you are orchestrating a lot of APIs, and all of those APIs are SSL and so you have to talk SSL to them. And so what we found was that we really needed to squeeze every bit of performance out of it, but while we are doing that, we end up blowing through our CPU use, and so you can attribute this ultimately to the cipher reset for SSL. And so, effectively think of—there's a great graphic I should have like our boxes on fire, yeah. There are ways to deal with that though, and we ended up, you can actually downgrade the, the ciphers. The ciphers that comes out of the box with openSSL are fairly strict, especially once you're in network. So can downgrade those, we dropped it down quite dramatically just by tuning that a bit, and actually I don't know if we'll distribute the slides, but if not, I will find a way to get the link out. One of the guys on my team has great gist around a shit load of tuning we did around SSL and how we did it. It's supposed to be blog post at one point, but anyways, so we dropped that down, but then we ran into another thing where with SSL, one of the things we have had on all of our previous tech stacks has been the concept of SSL, it is not unique to us, but we would take advantage of SSL resumption where effectively if you think of a short terms like, whenever you make an SSL connection you are like hey, cool here, OK, go and you make this little handshake where you end up—I know if you're like a crypto person, don't laugh at me—you make a handshake where you end up saying, cool, now you are secure. So you can pre-cache those, so effectively that would be the idea of SSL resumption where you pre-cache a bunch of handshakes, you leverage them over again, and you make out a savings on that.This is one of the things that out of the box Node originally it didn't look like it is supported, and so we ended up bugging a bunch of the core team over the course of time on GitHub and email. It almost came down to stalking, but eventually we ended up in finding out a way to actually implement it, which was sort of an undocumented way, and it has worked out for us in the long-run.

So this was a concern, it just took a little bit of extra mileage on our part namely to both solve it technically, but also be like, hey guys, SSL. So good news is, we have also been running numbers and sort of providing feedback here and here, so SSL performance in 11 and hopefully in 12 is definitely better.

These are just sort of random our numbers. I am sure there is actually official numbers. Maybe, maybe not. Maybe these are official, where SSL performance wise we are actually seeing much better throughput in it, and so that has actually been really great. And so, hopefully part of that is attributed to us, sort of standing up as a company and being like, hey, we only deal in SSL, this is one of the main concerns—it is really the only concern almost we have with Node at the moment, where we are comfortable it is going to get solved, but it was little hurdle for us to deal with. But the cool thing is, the team is responsive to that. We've been able to deal with them, we've been able to actually not only that but walk through, and identify like here are some real use cases that you end up in larger companies with that we need to get into Node, so that it helps mature the language, it helps mature the environment.

Outside of that, just sort of rehashing, like our engineers are loving Node. It's been an easy thing, I think never did we even think it would be as successful as it did within the company a year ago when we started this, then it has turned out, and it is taking over like wildfire. We are adopting it in the web app layer, we're adopting it in the API layer, and for the most part, everyone that's been doing it has been sort of like, turn from like the dull engineer walking around, not super happy to like I am really excited because we are using something cutting edge, and something exciting.

So, Node is awesome. I have questions, but we will deal with the questions later. So on that note, I am going to hand it back to TJ so we can hear about 12.

Sign up now for Instant Cloud Access Get Started