Node.js on the Road: Dominiek Ter Heide

Node.js on the Road is an event series aimed at sharing Node.js production user stories with the broader community. Watch for key learnings, benefits, and patterns around deploying Node.js.

Dominiek Ter Heide talks about Node.js at Bottlenose.

SPEAKER:
Dominiek Ter Heide CTO
Bottlenose

Hello there, I'm Dominiek. I'm the CTO Co-founder of Bottlenose. Today I will tell you a little bit about how we are using Node.js, but before that I want to tell you a little bit about how I got into Node.js.

So, after I
graduated college, I decided I wanted to see some more of the world. I'm from a little village in the Netherlands called Nigtevecht,
and
I wanted the exact opposite of that, so Tokyo in Japan was a good candidate for me. This was back in 2006 and Ruby on Rails was the shit before all the PHP people joined, no offense, so Tokyo, there were only three companies doing Ruby on Rails in all of Tokyo, which is ironic because Ruby was actually created in Japan many years before that.

Anyway, I landed at a E-learning startup. They built a learning system that helps people memorize things. So the system would keep track of the words that you were studying, and it would remind you to study them again. This was initially CD-ROM software, but we made it Web 2.0, put a social network around it, and BOOM!

We grew from one user to a million Japanese users within 18 months. Now I started as a junior engineer there, but at some point I became the CTO, so I was very involved with scaling this behemoth of a complex system to all these users. And there're some lessons I've learned around consumer scaling there. The DB and IO is always the major bottleneck. The only way to offset that is to cache the hell out of everything.

One thing, one mantra that I have kind of learned is, "Less Web, More App", as in loading complexity from your server side down to the browser, and having your UI logic more in the browser, having small JSON communications going back and forth between the APIs and the app. Background processing is something that's necessary, but it's also a pain because it increases complexity, and points of failure. So around the same time, Node.js started becoming the new cool kid on the block, and I liked it a lot because it resonated with my ideals around having server side more as a data broker, so having no UI logic on the server-side and just having event-driven code on your server side.

Also a lot of people dislike JavaScript as a language, but the way I saw it is that JavaScript was getting so much adoption that it's just like a force you shouldn't like try and stop, you should just go with it. Embrace it. So this was in 2010, I decided to move to a little Peninsula South of Yokohama called Enoshima where I took wave surfing lessons and I spent all of my other time programming. This is where I started developing a system for figuring out someone's interests, so you would give this system someone's username on social media, it would go through all the data, and it would try and make it interest graph out of all that data using specialized natural language processing. All of this was done in JavaScript which could run both on Node, but also in the browser to offload the computation to the people.

Fast forward, this became the backend technology for my current company called Bottlenose which I founded together with Nova Spivack in early 2011. What we do is we turn realtime big data into actionable insights. I'll tell you a little about that in a second. We have Fortune 500 Customers using Bottlenose. We started selling in 2013, only. We raised about $7 million in venture capital funding to date.

We have our headquarters in Los Angeles, we have 10 people there, we have two people in New York, and then we have the development team of 10 including me in Amsterdam. So here's a screenshot of one of our visualizations. So what we do is real time intelligence on streaming data like social data and broadcast data.

This visualization shows a live graph of what's happening around a certain topic. One of the big problems in big data is that there are so many data points, so much analytics that people cannot really keep up. Companies are hiring data scientists to figure out what to do with the data but they also don't know.

There's just not enough analysts in the world to go through all the data. So another thing we do is automate a pattern detection on all of this data. Here is an example where we have detected a news article around a United Airlines flight making emergency landing. This is an example of early news detection, where we pretty much beat the mainstream media 80% of the time, but there's many other use cases of this, like competitive intelligence, business intelligence, advertising, finance etcetera.

Under the hood, what we do, we ingest raw data, not only social media also broadcast data, so for example every minute we ingest about 40 hours of video with transcripts. We then perform data mining on that, like sentiment analysis, natural language processing, geocoding etc., and we then have an autonomous system on top of that called trend detection that finds patterns in this data.

And in the highest layer is the agents layer where we execute actions based on preset rules. So, we analyze about three billion messages every hour right now, and we do predictive analytics across 290 million data points every hour. This creates a continuous stream of hierarchical patterns. You could also call them signals that we've extracted from the noise, and those drive many different kinds of applications.

The core technologies that we usee under the hood are elasticsearch. We're big fans of elasticsearch, we actually contribute to the elasticsearch open source project as well, we also use Cassandra and Redis for persistence, we use RabbitMQ for a queuing system which we're going to out phase this year, but Node is the thing that ties everything together for us.

All our back end is in Node. We have a little bit of Python for mathematics specific stuff. So let me give you two examples of where we use Node, which I think are very exemplary for stuff that we do and for companies dealing with a lot of data. One example is a specific step in our processing stack called the Trend Detector.

This is one of 30 different steps that we do. The process goes as follows: it waits for a new job on RabbitMQ, it does an aggregation on elasticsearch, and for each of the items, it will pull time series information from elasticsearch, and it will then send this to a special purpose Python HTTP instance that does mathematical stuff like predictive analytics, and it will check for any anomalies. The results then get put back on the RabbitMQ queue. So, this is an example where we use Node to just tie everything together, right?

It's the glue.

Another example is a component system that we call SpiderCrab which is for spidering links. What it does, it extracts all the links from unstructured data. It then fetches the HTTP content and extracts the metadata. It stores this in a cache in Redis to reduce the number of fetches, and we can do about 150-400 link fetches per second, per CPU core. So this is like really performant.

On a monthly basis, we store about 400 million unique links, and this is an example where Node can really help you massively parallelize a certain operation.

So let me get a little
bit into how we grew as a development team together with the Node community. This is an example of boomeranging that we used to do maybe in the beginning where your logic nests in a way where you get these boomerang like patterns, of course nowadays we use Async js, but there used to be many control flow libraries in the past, some of them very buggy, and but finally we're very happy that we're now in a stable async solution for this kind of stuff.

We use NGINX as our front-end HTTP server. One of the take aways there is that you need to make sure to always return your HTTP requests. We've have to find out the hard way, if there's a little exception happening, Node won't return and if you use IP hash or a proxy at some point and your NGINX will die.

Testing frameworks, there's been many. We are now completely happy with Mocha. It's stable. We're using it in the browser side, but also on the server side, and we've also hooked it up to Travis CI, it works great for us, it also has great little output modules like the Nyancat. Chai is a module we use for assertions. There's many flavors of assertion libararies, we like to stick with asserts, it's just compatible with the Node one, and there's been too many flavors of assertion languages anyway.

Grunt, we use for scripting, express we use for all our HTTP framework stuff, and there's countless other NPM modules that I could probably list out. We also maintain some of our own but I think over the years, the main takeaway is that the Node community and ecosystem has really matured to a point where you can really use this in a stable production environment.

So that
concludes my talk, we are hiring, so come talk to me or someone with the Dolphin shirts, or mail jobs@bottlenose.com.

Thank you.
:

Sign up now for Instant Cloud Access Get Started